2023probabilistic & Statistical Reasoning

Probabilistic & Statistical
Reasoning
Woodlands Mathematicaa Publishing

2020
1
for the wonderful students of The Woodlands School
2
Special Thanks:
Cover Art: Spiral Painting by J. Heathfield

Contributors to BOB: Students of Spring 2013 MDM 4UE class at The Woodlands
Editor and Proofreaders: Mr. R. Dutton, Ms. M. Popovic; Students of Spring 2021 MDM 4U0 class at The
Woodlands.
To the Reader
This text is intended for a student audience participating in an introductory course in

probability and statistics at the secondary level. Looking beyond the topics contained within
this course it should be noted that in a sense this course is effectively about problem solving
and critical analysis more than techniques and theorems which is the usual case in
mathematics. To this end Iʼll try to write out as best as possible how the thought processes
play out from my point of view. However, that being said, each of us are very different, so in
that sense what is of greater importance is that you are an active participant in the reading.
Try as best as possible to work out the reasoning for yourself before reading through an
example, this will allow you to develop your own personal understanding of the topics
covered. Furthermore, it is advisable to read through every problem offered at the
conclusion of each section, and at the very least try to think of an approach to solving the
task, but more o en seek to work out a proper solution. In doing this your intuition for
problem solving will only grow and with some luck an appreciation for this wonderful
subject can be determined.
3
TABLE OF CONTENTS
title_section
Special Thanks: 3
To the Reader 3
TABLE OF CONTENTS 4
Unit 1: Combinatorial Reasoning 10

1.1 The Fundamental Principle of Counting 11
A Trivial Example 11
The Fundamental Counting Principle 13
Practice 1.1 15
1.2 Disjoint Cases and the Indirect Approach 16
Set Theoretic Definitions 16
When the Direct Approach Fails 17
The Disjoint Cases Approach 18
The Indirect Approach 20
A Summary of the Three Major Counting Approaches 23
Practice 1.2 24
1.3 Overlapping Cases: The Principle of Inclusion and Exclusion 26
When the Cases Approach Fails 26
The Principle of Inclusion and Exclusion 27
Practice 1.3 30
1.4 Counting Strategies I: Factorials and Permutations 32
Factorials 32
Permutations 33
Practice 1.4 35
1.5 Counting Strategies II: Satisfying Conditions 36
Practice 1.5 38
1.6 Counting Strategies III: Dealing with Symmetries 39
Practice 1.6 42
1.7 Counting Strategies IV: Combinations 43
Practice 1.7 46
1.8 Counting Strategies V: Counting Subsets 48
Practice 1.8 51
1.9 Counting Strategies VI: Partitioning 52
Practice 1.9 54
1.10 Counting Strategies VII: Pascalʼs Method 55
Properties of Pascalʼs Triangle 55
4
Pascalʼs Method 58
Practice 1.10 60
1.11 Challenge Problems 64
Unit 2: Probabilistic Reasoning 65

2.1 The Probability of a Random Event 66
The Ethos of Probability Theory 66
The Probability of a Random Event 66
Practice 2.1 70
2.2 The Cases Approach for Probability 71
Mutually Exclusive Events 71
Overlapping Cases: Non-Mutually Exclusive Events 72
Practice 2.2 75
2.3 Probabilities of Successive Events 76
The Direct Approach Returns! 76
Practice 2.3 78
2.4 Dependent and Independent Events 79
Classifying Successive Actions 79
Probability Trees 80
Practice 2.4 82
2.5 Conditional Probabilities 85
Practice 2.5 88
2.6 Using Simulations to Estimate Probabilities 90
Hitting the Combinatorial Wall 90
Demonstrations of Simulations 90
Spreadsheet Simulations 1 - Introduction.avi 90

Practice 2.6 92
2.7 Distributions of a Random Variable 94
Key Definitions 94
Practice 2.7 97
2.8 Binomial Distributions 99
Common Distribution Types 99
Patterning in the Binomial Distribution 100
Generalizing Binomial Distributions 101
Practice 2.8 103
2.9 Geometric Distributions 105
Waiting for a Success 105
The General Probability for a Geometric Distribution 106
Practice 2.9 107
2.10 Hyper-Geometric Distributions 108
Generalizing Hyper-Geometric Distributions 111
5
Expected Value of a Hyper-Geometric Distribution 111
Practice 2.10 112
2.11 Hypothesis Testing: An Introduction 114
Control Studies 114
A Useful Metaphor for Understanding Hypothesis Testing 116
Practice 2.11 118
2.12 Designing an Arcade Game of Chance 119
The Premise 119
Analysis of the Game 119
Unit 3: Statistical Reasoning 120

3.1 Statistically Designed Experiments 122
3.1 Practice 127
3.2 Sampling Methods 129
Multi-Stage Sampling 133
3.2 Practice 134
3.3 Measures of Central Tendency 135
The Arithmetic Mean 135
The Median of a Set of Data 138
The Mode of a Set of Data: 139
Choosing an Appropriate Measure of Central Tendency 140
Practice 3.3 141
3.4 Measuring Spread: Means 142
The Mean Deviation 143
Variance and The Standard Deviation 145
Practice 3.4 149
3.5 Measuring Spread: Medians 152
Quartiles 152
The Interquartile Range (IQR) 153
Box-Whisker Plots (Visualizing Spread) 153
Detection of Outliers 155
Practice 3.5 157
3.6 Scatterplots and Correlation 159
Covariance: Measuring Spread in 2-Dimensions 160
The Correlation Coefficient (r) 162
Practice 3.6 164
3.7 Linear Regression 167
The Least-Squares Line 167
Practice 3.7 172
3.8 Non-Linear Regression 175
Measuring the Strength of a Models Fit 176
6
Guidelines for Selecting Suitable Functional Models 179
Practice 3.8 180
3.9 Continuous Random Variables 182
Uniform Distributions: A Working Example 182
Discrete v. Continuous Random Variables 183
Probability Properties of Continuous Random Variables 184
Continuous Probability Distributions and Probability Density Functions 184
Exponential Distributions 188
Practice 3.9 192
3.10 Normal Distributions 193
Characteristics of Normal Distributions 194
Determining Probabilities with Normal Distributions 194
Practice 3.10 198
3.11 Confidence Intervals: Estimating Population Means 200
Repeated Sampling and The Central Limit Theorem 200
Confidence Intervals: Estimating thePopulation Mean from a Single Sample 201
Practice 3.11 205
3.12 Approximating Discrete Distributions 207
The Continuity Correction 209
Practice 3.12 211
3.13 Confidence Intervals: Estimating Population Proportions 213
Experimental Strategies for Yielding Suitable Confidence Intervals 215
Practice 3.13 217
3.14 Hypothesis Testing Revisited 218
p-Values 218
Hypothesis Test on Proportions 219
Final Thoughts on Experimental Results 220
p-Values v. The Population Proportion (p) 220
Practice 3.14 221
3.15 Case Studies 223
B.O.B. “Back of the Book” 225

UNIT 1: 226
UNIT 2: 230
UNIT 3: 234
Z-Score Table: 239
Bibiliography 241
7
8
Unit 1: Combinatorial
Reasoning
main_section
e nameless is the beginning of heaven and Earth.

e named is the mother of the ten thousand things.
Lao Tzu - Tao Te Ching
9
1.1 The Fundamental Principle of Counting
title_section
A Trivial Example
Tian, a newly graduated law student, has a very limited wardrobe for work. Each day he has a choice of
one of four suits (brown, navy, grey, and striped), and the option of two types of shoes (oxfords and
loafers). For how many days can he go to the office wearing a different outfit?
Solution:
The answer, ignoring extraneous variables such as shirt type, ties, etc... is clearly 8 days in a row!
⬛
At this point youʼre probably feeling somewhat offended that weʼre beginning this way in a senior level
mathematics course. This however is not the purpose of the example. Combinatorics is the study of
organized counting. In truth, we wonʼt really move mathematically beyond the most basic arithmetical
techniques learned at a very early age. Despite this, what makes this topic wildly fascinating is the
process by which we tackle and approach such problems like the one stated above, as they can and will
become much more complex very quickly. So letʼs take this trivial example and solve it using a
ʻcombinatorial approachʼ so that we will be able to translate this highly intuitive example to more
complex ones.
Alternate Solution 1: (Tree Diagram)

Slowing our thinking down a tad, we realize that there are two actions that Tian must decide upon while
selecting his outfit for the day. Heʼll most likely select a suit to put on AND THEN select a pair of shoes
to wear. We note that Tian can reverse this process yielding the same overall outfit, but most people
put their clothes on before their shoes so letʼs stick with that.
Graphically, we can represent Tianʼs decision process with a tree diagram. Each level of the tree
represents a different action which takes place and the branches represent the options (or outcomes)
that are possible for this action. The diagram below illustrates such a tree.
10
Following down one of the option paths formed by the tree we can isolate each possible wardrobe
outcome for Tian on a given day. The diagram below highlights the selection of a grey suit with oxford
shoes.
Clearly, we can see that this tree stems out to form 8 possible wardrobe outcomes (as we had initially
known before). It may also be apparent that this method can become very cumbersome (though very
useful later in the course) and so weʼll take a different tack in order to have a strategy that is more
efficient.
Alternate Solution 2: (The “DIRECT” Approach or Using PLACEHOLDERS)

Again we think of the actions that Tian decides upon, but instead of sketching and tracing out each
outcome, as with the tree diagram, weʼll use placeholders to indicate each action and its corresponding
number of options:
11
In this style of solution, the placeholder will have a description of the action on the bottom of the line
and the number of options for this action on the top.
The Fundamental Counting Principle
In mathematics the Direct approach has a more formal name, that being The Fundamental Counting
Principle. It is stated as follows:
Theorem 1.1.1 (The Fundamental Principle of Counting a.k.a. The “DIRECT” Approach)
If we have a series of successive actions, with a outcomes for the first action, b outcomes for the
second action, c outcomes for the third action, and so on... then there are
𝑎 × 𝑏 × 𝑐...
possible ways to perform the actions in total.
Proof:
Not ever wishing to remove opportunity from our students, the proof of this theorem will be le to the
reader! (Note: I wonʼt always cop out in this way, but itʼs useful for one to think this through as it will
make you revisit your conceptual understanding of multiplication.)
⬛
The key here is to realize that this principle is not so “deep” and instead think of it as an approach to
solving combinatorial problems. Even with the trivial example of Tian and his wardrobe, we see the
beginnings of what is entailed in this approach, namely, that when considering a counting problem, if it
is possible to break the problem as a series of successive actions, by really slowing down our thought
process and recognizing the step by step decisions that must be made, then we can simply multiply
the number of outcomes within each separate action to obtain the total number of possibilities. Letʼs
work through some other examples to highlight this principle.
Example 1.1.1
At one time, standard Ontario license plates were coded by having three letters followed by three digits.
How many standard plates can possibly be created in this manner?
Solution: (Direct)
When a solution can be approached in a direct manner, we should be able to construct a process by
which we can form any possible outcome required in the problem. In this case, we think through the
process of constructing a ʻstandardʼ license plate. First weʼll select a letter from A-Z, then another letter,
and so on... A er the three letters are selected weʼll select a digit from 0-9, then another, and finally a
third. Using placeholders we can display this as follows;
17 576 000
Notice, that this total would eventually become insufficient, which is why an extra letter was added!
⬛
12
It may seem that the final result is quite large. Get used to this! Combinatorial answers can be both
small and excessively large and consequently difficult at times to make a reasonable estimate. Of
greater importance is the approach by which you organize and break down the problem with. Hereʼs
another example to illustrate a deceptively large result.
Example 1.1.2
Ellen runs a babysitter service where she walks a group of five children to and from their homes every
day from school. While walking home, the children always walk in single file. How many ways are there
to arrange the children in a line?
Solution: (Direct)
Slowing the process down, we realize we can break this into a series of five successive actions as shown:
Perhaps surprisingly, there are 120 possible ways that Ellen could arrange this motley crue of five
children!
⬛
Example 1.1.3 How many ways can Ellen arrange the five children in the morning if either Charlie or
Mehwish must be chosen to be in front of the line?
Solution: (Direct)
We have a restriction on the first child and so our first action will only have two options possible. A er
this selection, we can “count freely", and so weʼll satisfy the condition first and then move on to the
remaining actions;
Thus, for this morning walk Ellen now only has 48 possible line ups to choose from.
⬛
13
Practice 1.1
Technique
1. A couple has narrowed down the choice of a name for their new baby to four first names and
three different middle names. How many first-middle name choices will they have to choose
from?
2. a) Draw a tree diagram representing the makeup (boy, girl) of a family with three children.
b) Using the tree diagram, count the number of ways a couple could have exactly two girls.
c) How many different families have either a boy as the eldest, or two boys as the youngest?
(NOTE: In this case ʻorʼ allows for both conditions to be satisfied.)
3. In an upcoming election, voters can choose:

- Vicky or Sharon for President;
- Victor, Eric, or Amber for Vice-President;
- Alex, Hannah, or Sophia for Secretary.
a) Draw a tree diagram illustrating the possible voting choices assuming all positions are voted on.
b) How many ways can the ballot be filled out if all positions must be filled?
c) How many ways can the ballot be filled out if the voter may choose not to vote for some
positions (or all)?
4. How many 10-digit phone numbers are possible if the first three digits can only be the area
codes (416, 905, or 647) and the next two digits cannot be either 1 or 0?
5. How many six-digit numbers are there whose digits alternate between even and odd? (Note:
When asked for a “number" consisting of a certain amount of digits it is implied that we cannot
have a leading zero and weʼll consider the digit ʻ0ʼ as “even”.)
6. a) How many ways can Leting guess at a 5 question multiple-choice quiz where each question
has four possible responses (A-D)?
b) In how many ways can she guess if she never wishes to have consecutive responses match?
Studies
7. How many even numbers less than 4 million can be formed by arranging the digits 1, 2, 3, 4, 5, 6,
7 where each can only be used once?
8. Two rooks on an 8x8 chessboard are said to threaten each other if they are in the same row or
the same column. Each row and column is parallel to a side of the board.
a) In how many ways can a white rook and a black rook be placed on the board so that they do not
threaten each other?
b) In how many ways can three rooks, one black, one white, and one red, be placed on the board
so that no two threaten each other?
9. Prove the Fundamental Principle of Counting.
14
1.2 Disjoint Cases and the Indirect Approach
title_section
Set Theoretic Definitions
Before progressing, weʼre going to have to go over some definitions that weʼll employ throughout the
course.
Definition 1.2.1 A set is a defined collection/grouping of objects/concepts called elements.
E.g. 𝐴 = {1, 2, 5, 7}
𝐵 = {𝑇𝑜𝑛𝑦, 𝐽𝑎𝑠𝑜𝑛, 𝐿𝑎𝑘𝑒 𝑂𝑛𝑡𝑎𝑟𝑖𝑜, 𝐸𝑖𝑓𝑓𝑒𝑙 𝑇𝑜𝑤𝑒𝑟}
ℵ = {1, 2, 3, 4, 5, ...}
What you may notice is that the elements that comprise a set can be “anything". Because of this,
properties derived in the study of set theory can become very precarious to work with. The notation for
a set is usually denoted with an uppercase letter and its elements or conditions denoted with brace
brackets.
Graphically, sets are typically represented with Venn Diagrams. For example, using the sets A and ℵ as
defined above we get;
Definition 1.2.2 The union of two sets A and B is itself a set

denoted, (𝐴 ∪ 𝐵), which comprises all elements found in either A
or B.
15
For example, let M represent the set of letters that are used to spell ʻmathematicsʼ, and S represent the
set of letters which are used to spell ʻscienceʼ:
𝑀 = {𝑚, 𝑎, 𝑡, ℎ, 𝑒, 𝑖, 𝑐, 𝑠},
𝑆 = {𝑠, 𝑐, 𝑖, 𝑒, 𝑛}
Then,
(𝑀 ∪ 𝑆) = {𝑚, 𝑎, 𝑡, ℎ, 𝑒, 𝑖, 𝑐, 𝑠, 𝑛}
Definition 1.2.3 The intersection of two sets A and B consists of all

elements common to both A and B.
For example, observing the sets M and S addressed above, we get:
(𝑀 ∩ 𝑆) = {𝑠, 𝑐, 𝑖, 𝑒}
Definition 1.2.4 The complement of a set A, denoted A c, consists of

all elements in the outcome space that are not in A.
Once again referring to our running example, the complement of the

set M relative to the standard English alphabet would be:
𝑐
𝑀 = {𝑏, 𝑑, 𝑓, 𝑔, 𝑗, 𝑘, 𝑙, 𝑜, 𝑝, 𝑞, 𝑟, 𝑢, 𝑣, 𝑤, 𝑥, 𝑦, 𝑧}
Definition 1.2.5 The cardinality of a set A, denoted 𝑛(𝐴), indicates the number of elements contained
within the set.
For example, we would get that:

𝑛(𝑀) = 8
𝑛(𝑆) = 5
𝑐
𝑛(𝑀 ) = 18 and
𝑛(𝑀 ∩ 𝑆) = 4
Thatʼll do for now, weʼll use this terminology at times throughout this section and the remainder of this
course, however our main goal will remain to focus how one reasons through combinatorial problems,
so letʼs get back to it!
When the Direct Approach Fails
In the previous section we discovered that if a counting problem can be broken down into a sequence of
actions (decisions) then we could invoke the Fundamental Counting Principle and simply multiply the
16
corresponding number of options that each action can possibly take on. As you could well imagine,
working directly in this manner is not always an obvious or viable option. Letʼs consider the example
below:
Example 1.2.1
In a standard 52 card deck, there are four suits (called diamonds, clubs, hearts, and spades; diamonds
and hearts are coloured red while clubs and spades are coloured black) each of which contain thirteen
values (Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, and King). If a single card is selected from a standard 52
card deck, how many ways could a black Ace (Ace of clubs or spades) or a red face card (Jack, Queen, or
King) be chosen?
Solution:
You may be immediately aware that there is only one single action that takes place, that of selecting the
lone card. As such, the “Direct Method" of break-ing down the process into a sequence of actions will
not apply. Instead, what we notice here is that we can break this problem down into distinct cases.
Specifically, the cases are that of obtaining a “Black Ace" OR a “Red Face Card". In this instance, we can
then count how many possibilities we have for each case and then sum up the outcomes. Letʼs see how
this plays out;
Case Description Total Outcomes
2
i. Black Ace Ace of Clubs, Ace of Spades
6
ii. Red Face Card
TOTAL 8
⬛
The Disjoint Cases Approach
You may have noticed in the above solution that we were able to apply the direct approach
(Fundamental Counting Principle) of counting with the “Red Face Cards" case. This will occur
frequently!
17
The general approach here can be described as follows:
Theorem 1.2.1 (Counting Mutually Exclusive Subsets a.k.a. The “CASES” Approach)
If a set of outcomes can be separated into cases (subsets); A, B, C, ...whereby the first case has n(A)
total outcomes, the second n(B) total outcomes,
and so on… then there will be
𝑛(𝑆) = 𝑛(𝐴) + 𝑛(𝐵) +...

total possible outcomes in the set.
Proof:
Once again, weʼll leave this argument to the reader.
⬛
Example 1.2.2 (Licence Plates Revisited)

In Ontario, the current standard format for vehicle license plates consists of a sequence of 4 letters
followed by three digits. Previous to this road vehicles displayed a three letter, three digit sequence.
Determine how many such plates are possible?
Solution: (Cases)
We can break this problem into two distinct cases, namely, (4-Letters, 3-Digits) and (3-Letters, 3-Digits):
i. 4-Letters,
3-Digits
456 976 000
ii. 3-Letters,
3-Digits
17 576 000
TOTAL 474 552 000
This amount of possible license plates should more than account for the possibilities for quite awhile.
We can even note that some outcomes would be restricted (e.g. curse words, Green vehicles having to
18
begin with ʻGVʼ, etc.). More importantly, notice how much larger the outcome set becomes by simply
adding 1 letter to the sequence! This is why itʼs encouraged to create passwords that are longer with
more varied types of characters as the sheer size of the possible outcomes becomes more difficult to
hack into.
⬛
Once again, we see that the problem could be broken into two disjoint cases with each separate case
easily counted using the Fundamental Counting Principle. It would be remiss however to show a
“tricky" alternate solution to this problem, so here we go;
Alternate Solution: (Direct)

Consider the (4-Letter, 3-Digit) case from above with a minor wrinkle: allow a 27th option for the very
first letter, that of a “blank" letter. If the blank is selected first, then we obtain the (3-Letter, 3-Digit)
case!
Of note, in the above example, we compressed the outcomes for the 3-Letter and 3-Digit actions by
invoking the Fundamental Principle of Counting. This is a good practice to become familiar with in
order to streamline your argument.
The Indirect Approach
We now move to our final major approach to determining the number of outcomes of a combinatorial
situation. To begin with, letʼs observe an example:
Example 1.2.3
Determine all non-negative integers (0, 1, 2, 3, ...) less than or equal to 1000 that include at least one
occurrence of the number 5.
Solution: (Cases)
Weʼll break the problem up according to the number of digits as shown;
19
i. 1 Digit
must be a ʻ5ʼ
1
ii. 2-Digits (Cases) - Based on the first digit
i. Begins with a ʻ5ʼ : 18
ii. Doesnʼt begin with a ʻ5ʼ:
________________________________________________
TOTAL = 18
iii. 3-Digits (Cases) - Based on the first digit

252
i. Begins with a ʻ5ʼ:
ii. Doesnʼt begin

with a ʻ5ʼ:
________________________________________________
TOTAL = 252
TOTAL 271
20
This solution seems quite the ordeal for such a simply worded problem. In fact, if you observe the
original problem it seems much easier to actually count the numbers which do not include any fives.
This complementary case can be exploited to our advantage as shown in the alternate solution below:
Alternate Solution: (Indirect)

In total we have 1000 natural numbers. Of these numbers we have a certain amount which do not
include at least one 5, as illustrated below;
Viewing things this way, we realize that we can simply subtract the undesirable outcomes from the
total.
Outcomes
Total Possible
Outcomes
All numbers from 0 to 999 1000
Undesirable
Outcomes
“BAD” Cases 729
DESIRABLE OUTCOMES (GOOD CASES) = (TOTAL) - (BAD) 271
As you can see, the “Indirect Method" can be highly effective especially if there are many cases and
subcases to worry about. Thus we have our third and final major counting tactic:
Theorem 1.2.2 (Complementary Outcomes ak.a. The “INDIRECT” Approach)

If a set of outcomes (S) contains a subset of outcomes (A), then the number of elements that are part of
set A and the number of elements that are not in the set A (labelled Ac), we have that;
𝑐
𝑛(𝑆) = 𝑛(𝐴) + 𝑛(𝐴 ), where n represents the number of elements in
each set.
21
Alternatively, we can write this as;
𝑐
𝑛(𝐴) = 𝑛(𝑆) − 𝑛(𝐴 )
GOOD ＝ TOTAL − BAD
A Summary of the Three Major Counting Approaches
Weʼve now observed the three major theorems/approaches which allow us to count the number of
outcomes for a given combinatorial problem. Most problems can be tackled by employing one of these
approaches and so youʼll likely notice that in the presented solutions moving forward, that a declaration
of the approach will be made at the onset. At this point, it would be instructive to summarize and
compare each of these approaches:
Approach Primary Key Indicators

Operation
DIRECT ✖ Can obtain all outcomes from a step-by-step process or series of

(Multiplication) successive actions/decisions.
Key Words: “AND”; “and then..”
CASES
+ Can obtain outcomes by considering separate subsets (cases) which
have no outcomes in common.
(Addition)
NOTE: Each case then becomes its own combinatorial problem which
can be approached with any of these three approaches!
Key Words/Phrases: “OR”; “this or that”
INDIRECT ➖ Easier to count outcomes which are undesirable (“BAD” outcomes).

(Subtraction)
Key Words: “NOT”; “at least”; “at most”; “canʼt”
For the vast majority of combinatorial problems, one of these tactics (or a combination thereof) will
have to be decided upon. For someone new to this type of reasoning it is imperative to attempt as
many problems as possible to gain a certain “intuition" as to how to best approach a solution. Many first
time students will feel a sense of uncertainty while practicing, but this is OK. As youʼve seen already,
there are o en many approaches that will yield a successful solution; so it is simply a matter of playing
with these problems to gain greater insight to their inner workings. Lastly, if you are new or relatively
new to this subject it is also best to not dwell on producing the most elegant solution, rather just get to
an answer first, then have your mind open to alternate possibilities as this will only act to accelerate
your development for this type of reasoning.
22
Practice 1.2
Technique
1. There are five different Science-Fiction books, six different Fantasy books, and eight different
Romance books on a shelf. How many ways are there to select two books which are not of the
same genre? NOTE: In this problem we do care about which books were selected first and
second, for example, S1 and F2 would be a different way of selecting the collection F2 and S1.
2. How many ways are there to form a three-letter sequence using the letters a, b, c, d, e, f such
that:
a) repetition of letters is allowed?
b) repetition of letters is not allowed?
c) repetition is not allowed and the sequence contains the letter e?
d) repetition is allowed and the sequence contains the letter e?
3. How many four-letter arrangements (sequences of letters with repetition) are there in which
vowels appear only (if at all) as both the first and last letter?
4. How many ways are there to pick a man and a woman who are not husband and wife from a
group of n married couples?
5. Urja has 12 textbooks on her bookshelf. In how many ways can she arrange them so that the
titles are not in alphabetical order?
6. There are 15 different apples and 10 different pears in a bucket. How many ways are there for
Dan to pick an apple or a pear and then for Christy to pick an apple and a pear?
Studies
7. How many ternary sequences (only contain the digits 0, 1, and 2) of length 10 are there without
any pair of consecutive digits being the same?
8. How many different five-letter sequences can be made using the letters A, B, C, D with repetition
allowed; where the sequence does not include the word ʻBADʼ, that is, sequences such as ABADD
are not allowed?
9. Morse Code is a binary type of communication whereby letters are formed by a sequence of
pulses. Each pulse is either long or short in length. For example, the letter e is represented by a
single ʻshortʼ pulse, while the letter z is represented by the sequence (long, long, short, short). If
the English language requires a minimum of 90 characters (upper and lower case letters, digits,
punctuation, etc...); what is the maximum length of a Morse code sequence required to
represent all of the characters?
10. How many ways are there to park 14 cars in 16 spaces where no two consecutive spaces are
empty?
23
Repertoire
11. Prove Theorem 1.2.2 (Counting Mutually Disjoint Subsets a.k.a. The “CASES” Approach).
12. Prove Theorem 1.2.3 (Counting Complementary Sets a.k.a. The “INDIRECT” Approach).
13. On the real line, place n white pegs at positions 1, 2, 3, ..., n and n blue pegs at positions -1, -2,
..., n (0 is le open). White pegs can only move to the le , while blues only to the right. When
beside an open position, a peg may move one unit to occupy that position. If a peg of one
colour is in front of a peg of the other colour that is followed by an open position, a peg may
jump two units to the open position. By a sequence of these two types of moves (not
necessarily alternating between white and blue pegs), one seeks to get the positions of the
white and blue pegs interchanged. Prove, using a combinatorial argument, that in general you
2
( )
will require 𝑛 + 2𝑛 moves to complete the game.
24
1.3 Overlapping Cases: The Principle of Inclusion
and Exclusion
title_section
When the Cases Approach Fails
Breaking our outcome space into distinct cases is a clearly useful tactic for solving combinatorial
problems, but thinking in terms of the set theoretic notions introduced in the previous section, one
canʼt help but wonder what happens when it is not easy to break our outcome space into disjoint
subsets? The answer to this conundrum is fairly intuitive so letʼs first examine an example to illuminate
the idea:
Example 1.3.1
At The Woodlands School, there are 13 teachers in the Mathematics and Engineering department and
10 in the Physical Sciences department. Five teachers are in both departments. How many teachers are
in either the Math or Science department?
Solution: (Cases)
Listing the members of each department as a set we observe the following sets;
Math and Engineering (M)

{Anton, Cross, Dutton, Fong, Gershater, Gunner, Hamilton, Heathfield, Le, Popovic, Tan, D. Williams, Wyman-McCarthy}
Physical Sciences (P)

{Anton, Doret, Fong, Hamilton, Lee, Raybould, Reka-Isaj, Tan, B. Williams, Wyman-McCarthy}
It is evident that if we were to consider the two departments as CASES, then out conclusion would be;
𝑇𝑜𝑡𝑎𝑙 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 = 𝑛(𝑀) + 𝑛(𝑆) = 13 + 10 = 23
This would clearly not work though as weʼre overcounting those members who belong to both
departments (highlighted in yellow); specifically each of these elements were counted twice.
Thus, to compensate for overlapping CASES

such as this, we can subtract off the instances
of members who belong to both departments
(i.e. in the intersection of M and P) giving us;
𝑇𝑜𝑡𝑎𝑙 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 = 𝑛(𝑀) + 𝑛(𝑃) − 𝑛(𝑀 ∩ 𝑃)

25
𝑇𝑜𝑡𝑎𝑙 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 = 13 + 10 − 5 = 18
Thus, there are 18 teachers either in the Mathematics and Engineering or the Physical Sciences
departments.
⬛
The Principle of Inclusion and Exclusion
We move to generalize the approach taken to deal with overlapping cases in the theorem which
follows:
Theorem 1.3.1 (The Principle of Inclusion and Exclusion)

Given two sets A and B, the cardinality of their union, (𝐴 ∪ 𝐵), is given by;
𝑛(𝐴 ∪ 𝐵) = 𝑛(𝐴) + 𝑛(𝐵) − 𝑛(𝐴 ∩ 𝐵)
Proof:
Weʼll work with a visually based argument here to hopefully assist in understanding of this approach.
As 𝑛(𝐴 ∩ 𝐵) is counted twice we subtract off this “slice" from either n(A) or n(B) to compensate, leaving
26
us each region of (𝐴 ∪ 𝐵) only counted one time and thus giving us the desired relationship,
𝑛(𝐴 ∪ 𝐵) =𝑛(𝐴)+𝑛(𝐵)−𝑛(𝐴 ∩ 𝐵)
⬛
Example 1.3.2
A researcher is observing a collection of insects for two genetic characteristics, labelled as x and y. In
total 100 insects were examined. Every insect has either x or y or both, but the researcher cannot
determine directly whether an insect has characteristic y alone. She finds that 48 insects have x and 23
have both x and y. Given this information, determine:
a) how many of the insects have characteristic y.
b) how many only have characteristic y.
Solution: (Cases)
a) Weʼll let X represent the set of insects with trait x, and Y represent the set of insects with trait y.
Since every insect exhibits x or y then we can deduce that;
𝑛(𝑋 ∪ 𝑌) = 100 = 𝑛(𝑋) + 𝑛(𝑌) − 𝑛(𝑋 ∩ 𝑌)
100 = 48 + 𝑛(𝑌) − 23
𝑛(𝑌) = 100 − 48 + 23 = 75
Thus, 75 of the insects exhibit characteristic y.
b) It is sometimes useful to solve the CASES approach “graphically", especially when we have
overlapping cases. To do this, weʼll employ a Venn diagram with each set representing one of
the cases, then work from the inside outward to label the number of outcomes in each
ʻregionʼ of the diagram.
27
Thus, observing the Venn diagram, we have that 52 insects have characteristic ʻyʼ only.
⬛
When working through the ʻgraphicalʼ method of solving itʼs important to remember to subtract o any
outcomes of a set already accounted for. This is why it is imperative to start from the intersection of the
two sets and work your way outward.
Not surprisingly, the principle of inclusion and exclusion is applicable to three or more sets as well, but
the derivation of this is le as an exercise. Be sure to work through the three set scenario to consolidate
your understanding of this section!
28
Practice 1.3
Technique
1. There are 24 students currently enrolled in our class. Of these, 15 are currently taking a Science
course while 8 are taking English, with 5 taking both a Science and an English course. Determine
the number of students in our class who are neither taking Science nor English.
2. There are 27 cats at a shelter. 14 of them are short-haired, 11 of them are kittens, and 5 of them
are long-haired adult cats (not kittens). How many of the cats are short-haired kittens?
3. At a party, all except 5 people enjoyed the cake or the strudel; 16 people said they liked the cake
while 12 people said they liked the strudel. Find an upper and a lower bound for the number of
people at the party.
4. Of the 30 students in the Health & Wellness SHSM at our school, there are twice as many
students with dark hair as there are with blue eyes, 6 students with dark hair also have blue
eyes, and there are 3 students with neither dark hair nor blue eyes. How many students have
blue eyes?
Studies
5. a) Derive the Principle of Inclusion and Exclusion for three sets A, B, and C. Specifically,
determine a method of determining the cardinality of the union of the three sets
𝑛(𝐴 ∪ 𝐵 ∪ 𝐶).
b) Derive the Principle of Inclusion and Exclusion for four sets.
c) n-sets
6. At the GoodDog Obedience School, dogs can learn to do three tricks: sit, stay, and roll over. Of
the dogs at the school:
50 dogs can sit 17 dogs can sit and stay
29 dogs can stay 12 dogs can stay and roll over
34 dogs can roll over 18 dogs can sit and roll over
9 dogs can do all three 9 dogs can do none
a) How many dogs are in the school?
b) How many dogs can do exactly 2 tricks?
7. In a survey of internet habits of the common Woodlands teenager it was found that:
44 students use Twitter
49 students are on Instagram
56 students watch YouTube
27 use Twitter and are on Instagram
19 are on Instagram and watch YouTube
24 use Twitter and watch YouTube
10 students use all three forms of media, while 9 use none.
How many students were surveyed in this study?
29
8. During the course of a 185-day school year, it was found that, on 5 days, the school cafeteria had
no ice cream in stock. On 172 days, vanilla ice cream was available. On 12 days there was
exactly one flavour available. On 78 days, there were exactly two flavours available. For 100
days, a customer could purchase either chocolate or strawberry ice cream, but not necessarily
both. Did the cafeteria ever sell any flavour of ice cream other than vanilla, chocolate, or
strawberry? Justify your response.
9. In a math contest, three problems, A, B, and C were posed.

● Among the participants there were 25 who solved at least one problem.
● Of all the participants who did not solve problem A, the number who solved problem B
was twice the number who solved C.
● The number who solved only problem A was one more than the number who solved A
and at least one other problem.
● Of all participants who solved just one problem, half did not solve problem A.
How many solved just problem B?
10. As of September 2022 Statistics Canada reported the following relating to the administration of
the COVID-19 vaccine:
● the total population of Canada was measured to be 38 929 902 people.
● 32 663 177 people had received at least one dose
● 32 315 761 people had dose 1 administered (F)
● 30 960 071 had dose 2 administered (S)
● 19 165 217 had dose 3 administered (T)
● 31 392 262 have completed two doses
● 19 062 753 completed two doses and received 1 or more additional doses
● 5 095 752 completed two does and had received an additional 2 doses
Is the data presented valid? If so, display the data in a Venn diagram, if not justify why.
Repertoire
11. Generalize the Principle of Inclusion and Exclusion for the cardinality of n-sets; A1 , A2 , … , An. To
argue this result you may need to make an inductive argument.
30
1.4 Counting Tactics I: Factorials and Permutations
title_section
Factorials
Having developed our three primary combinatorial approaches, we now move to more specific tactics
which frequently occur and are useful to become proficient with. Our first such strategy will revolve
around the situation of arranging objects.
Example 1.4.1
A so ball team has 10 players on it. How many batting orders can be made by using all 10 players only
once?
Solution: (Direct)
Breaking the decision process down by player we get;
Thus, there are 3 628 800 possible batting orders for this team!
⬛
A product such as 10 × 9 × 8 ×... × 2 × 1 is a common occurrence in combinatorics and so a special

type of notation has evolved to denote it;
Definition 1.4.1
Given a natural number 𝑛 ∈ ℵ, we define n-factorial as the product;
𝑛! = 𝑛(𝑛 − 1)(𝑛 − 2) ··· (2)(1)
What is important here however is that when using factorial notation, we always keep our mindʼs eye on
the idea that this generally represents the arrangement of n-objects. The power of the notation is that
it allows us to display a solution in a more concise manner avoiding long chains of multiplication which
could take away from an effective presentation of your argument. Letʼs see this notation in another
problem;
Example 1.4.2
Xiaoya has a collection of PS4 games on her bookshelf. In her collection she has 5 Puzzle Games, 3 RPGs,
7 First Person Shooters, 4 Arcade, and 8 Sports Games. If Xiaoya wishes to keep each genre together, in
how many ways could she arrange her Games?
31
Solution: (Direct)
We break down the arrangement process first by arranging the order of genres and then arranging the
games within each genre as shown below;
As you can see in the solution above, the use of factorial notation greatly streamlines and aids in
showcasing a combinatorial argument. In fact, as the final result is so large it would be acceptable to
leave the answer in “factorial" form as this description is clear enough on its own merit. Letʼs move
again to another highly common counting situation.
Permutations
Example 1.4.3
A three person advertising committee consisting of a project manager, graphic design artist, and a sales
representative is to be chosen from a group of 15 candidates at a firm. In how many ways could this
committee be chosen?
Solution: (Direct)
We can break this problem down based on the actions of selecting the manager, then the artist, and
finally the sales representative;
The product shown above greatly resembles a ʻfactorialʼ type of expression but is incomplete. With this
in mind, we can use factorials to express this type of product as shown below:
15×14×13×12×···×2×1 15! 15!

15 × 14 × 13 = 12×···×2×1
= 12!
= (15−3)!
Again, as this type of situation arises with great frequency, a notation is warranted, namely that of a
permutation defined as follows:
Definition 1.4.2
32
A arrangement of r objects taken from a collection of n objects total is called a permutation and is given
by the expression;
𝑛!
𝑃 =
𝑛 𝑟 (𝑛−𝑟)!
As with factorial notation, the permutation notation will greatly aid us in an effort to clearly present a
combinatorial argument. Of interest however is a special consequence of these definitions.
Theorem 1.4.1 (Zero Factorial)

Prove that 0! = 1.
Proof:
Consider the number of ways can we arrange n objects taken from a collection of n total objects.
I. Intuitively, this problem should make you think of the batting lineup example encountered
earlier in this section. Thus we know that the number of ways one could arrange these objects
is given by the expression;
𝑛!
II. Using the concept of a permutation we get;

𝑛! 𝑛!
𝑃 =
𝑛 𝑛 (𝑛−𝑛)!
= 0!
Considering that these two results should yield the same result we can equate them yielding;
𝑛!
𝑛! = 𝑃 =
𝑛 𝑛 0!
𝑛!
⇒ 𝑛! = 0!
⇒ 0! = 1
⬛
For the exercises that follow, try to employ factorial and permutation notation whilst constructing your
solutions as this will help you both become more familiarized with the notations as well as be able to
read through many combinatorial arguments.
33
Practice 1.4
Technique
1. Simplify the following expressions without the use of a calculator:
9!
a)
5!
12!
b)
3!(12−3)!
c) 18 · 17!
d) 𝑛 · (𝑛 − 1)!
𝑛!
e)
(𝑛−2)!
2. Determine the number of 4-letter permutations using the letters that spell:
a) GRADE
b) POLICE
3. Each day, the SAC office is open for any student to request assistance for four hours. If the
council consists of seven members, and four of them are assigned each day for a one hour shi ;
how many ways can the daily shi be assigned?
4. Determine the number of four letter arrangements of the word FACETIOUS that can be formed
such that:
a) exactly two vowels must be used.
b) no vowels can be adjacent.
Studies
5. Solve for n in each of the following equations:
a) 𝑛! = 42(𝑛 − 2)!
(𝑛+1)!
b)
(𝑛−1)!
= 20
c)
2𝑛 3
𝑃 = 2 ( 𝑃
𝑛 4 )
6. How many five-letter arrangements of the word STEVIN can be formed if either one vowel is
used or E is to precede I?
7. Determine all sets of positive integers x, y, and z for which x < y < z and x! + y! = z!.
8. Determine the highest power of three which divides evenly into 1123! .
34
1.5 Counting Tactics II: Satisfying Conditions
title_section
We o en are given set conditions or restrictions to abide by in a counting problem. In general, when
weʼre set with a condition it will be good practice to satisfy the condition and then move on to count
freely. Letʼs observe some examples of this strategy.
Example 1.5.1
How many arrangements of the letters A, B, C, D, E, F start or end with a vowel?
Solution: (Indirect)
Outcomes
Total Possible
Outcomes
6! 720
Undesirable
Outcomes
“BAD” Cases 288
DESIRABLE OUTCOMES (GOOD CASES) = (TOTAL) - (BAD) 432
What is important in this solution is that when counting the “undesirable" outcomes we did not proceed
sequentially from the first letter to the second and so on...but instead we imposed the conditions
required onto the first letter then the last letter which allowed us to count freely for the remaining
middle letters.
We now move to one of the more classic counting problems. It goes as follows;
Example 1.5.2 (Satisfying the Picky People)

A group of 10 friends go to the movies and reserve a row in the theatre so that they can all sit beside
each other. In how many ways could they occupy the row if Alice and Effie must sit beside each other?
Solution: (Direct)
Weʼll break this problem down by first counting the ways we can place Alice and Effie beside each other,
namely they can occupy the first and second seats, second and third seats, etc...then weʼll arrange the
remaining friends freely.
35
⬛
A variation of this solution imposes the condition in somewhat of a non-intuitive, yet effective manner;
Alternate Solution: (Direct)

Since Alice and Effie must be placed together weʼll treat them as one person!! This leaves us with 9
ʻfriendsʼ to arrange as we wish.
36
Practice 1.5
Technique
1. Considering Example 1.5.2, suppose Wasay also demands to be beside Alice and Effie so that
the three of them are together (in any order). How many ways could the 10 friends sit?
Studies
2. Determine the number of ways one could arrange the letters that spell CAMPGROUND such that:
a) the three vowels do not occur consecutively (i.e. can have two beside each other but not all
three).
b) the letter A occurs somewhere before the letter O.
c) ʻCAMPʼ must occur together (in any order); for example OGAMPCRNDU would be a valid
arrangement.
d) all three conditions from parts a), b), and c) are met!
3. How many ways can we form a 10-digit code using each of the numbers 0, 1, 2, ..., 9 (with no
repetitions) such that 5 is not in the first position or 9 is not in the last position (this also
excludes 5 first and 9 last)?
4. Three of the integers 1, 2, 3, ..., m are arranged, if one quarter of these arrangements contain the
number 5, determine m.
37
1.6 Counting Tactics III: Dealing with Symmetries
title_section
We now are set to tackle a critical element in combinatorial problems, that of repetitious overcounting.
The following example will illustrate the issue.
Example 1.6.1
In how many ways can we arrange the letters which spell BABY?
Solution:
This problem seems simple enough and we may be tempted to simply state that by arranging the
4-letters we will have 4! arrangements, but letʼs observe some of the outcomes generated by this initial
attempt at a solution. To make our outcomes more clear weʼll denote the two ʻBsʼ by B1 and B2:
B1 B2 A Y B 2 B1 A Y
B1B2 Y A B2 B1 Y A
B1 A B2 Y B2 A B1 Y
B1 Y B2 A B2 Y B1 A
... ...
Proceeding in this manner we would generate all of the possible outcomes, but notice that our initial
problem does not designate the two ʻBsʼ as B1 and B2 ; so if we examine the firrst row of our generated
outcomes with subscripts removed we get,
BBAY BBAY
Clearly, without the subscripts, it is evident that weʼre overcounting. Specifically, weʼre counting each
outcome twice. To compensate for this we can make a simple adjustment and divide by 2 (or more
precisely 2! as this is the number of ways the two ʻBsʼ could be arranged in each unique outcome). Thus
our solution becomes,
4!
2!
= 12
⬛
We can extend this idea further to where we have multiple repeated elements to work with as shown in
the next example.
Example 1.6.2
38
Determine the number of arrangements of the letters that spell MISSISSAUGA?
Solution:
Weʼll start by taking an inventory of our letters: 1M, 2Is, 4Ss, 2As, 1U, and 1G.
Thus for every arrangement of the 11-letters weʼll overcount by 4! ways for the Ss, and 2! ways for the As
and Is respectively.
Thus weʼll have to divide out to compensate for the repetitious counting giving us,
11!
2! · 4! · 2!
= 415 800
Iʼs Sʼs Aʼs
⬛
Theorem 1.6.1 (Counting Arrangements with Symmetries)

In general, if we are arranging n-objects with a identical objects of one type, b identical objects of a
second type, c of a third type, and so on... then there will be
𝑛!
𝑎! · 𝑏! · 𝑐! ···
total arrangements of the objects.
Proof:
Weʼll leave this up to the reader in the Studies portion of the practice. The key will be to generalize the
reasoning from the first example.
This type of overcounting is a kind of symmetry imposed by the repetitiveness of the objects being
identical in nature. We commonly think of symmetry in mathematics in terms of geometric or spatial
relationships. It turns out that our result with symmetry discussed above still applies.
Example 1.6.3 (The Clock Problem)

In how many ways can the numbers 1 - 12 be arranged on a clock-face?
Solution: (Direct Approach)

If we consider the standard arrangement of the letters as an example, letʼs observe what happens when
we rotate the clock by 30o;
39
A er rotation, we see that the numbers seem to be arranged differently, but if you tilt your head we
effectively see that we have the same arrangement. Thus, any arrangement that we can devise of these
numbers have a rotational symmetry, and so weʼre overcounting again, this time by a factor of 12. To
compensate we again divide by this factor yielding;
12!
12
= 39 916 800
arrangements.
⬛
Again, this illustrates that when we consistently overcount by a certain factor then we divide out by this
factor to adjust for the symmetry/repetitious outcomes.
We now move to an alternate way of approaching the Clock Problem which may or may not seem more
intuitive, but is an option nonetheless.
Alternate Solution: (Direct Approach)

The rotational symmetries exist as there is a lack of a reference point to determine a starting point to
arrange from. Thus weʼll impose one!! Weʼll start by fixing the ʻ12ʼ onto the top position of the clock
and arrange the remaining numbers.
Since weʼve eliminated any rotational

symmetries we can now arrange the other
numbers freely giving us
11! = 39 916 800

arrangements.
40
Practice 1.6
Technique
1. In how many ways can the letters that spell KIMMERER be arranged?
2. How many numbers greater than 300 000 are there using only the digits 1, 1, 1, 2, 2, 3?
3. The Toronto Maple Leafs had a nal season record of 32 wins, 39 losses, and 11 ties. In how many
ways could they have achieved this record?
4. In how many ways can the letters that spell CANADA be arranged so that the consonants appear
in their original order; for example AAACND is allowed but AAANCD would not be?
5. A charm bracelet is formed by stringing 6 beads of differing colours and then tying a knot in the
string. In how many ways could this bracelet be formed? (Note: There is a “hidden” symmetry
here! )
6. Cameron wishes to visit his best friend Dylan. Dylan lives three blocks North and 5 blocks West
of Cameron. How many routes could Cameron take to arrive at his friend's house provided that
he always travels either North or West?
Studies
7. Five women and five men sit around a circular table. Determine how many ways they can be
seated, relative to each other, such that:
a) no one of the same gender identification are seated beside each other.
b) Tappy is not seated beside Aileen nor Kathleen. (NOTE: the gender condition does not apply.)
8. How many even numbers which are less than 3 000 000 can be formed using all of the digits
from the list 1, 2, 2, 3, 5, 5, 6?
9. How many 4-letter arrangements can be formed using the letters that spell PARALLELOGRAM?
10. Prove Theorem 1.6.1.
Repertoire
11. In how many ways can we line up 6 green bottles and 8 brown bottles such that there is exactly
one pair of green bottles only. The other green bottles must be kept separate. (Note: The
challenge for now is to solve this problem only using concepts covered thus far in the text; weʼll
later discover tactics that allow us to simplify this solution.)
41
1.7 Counting Tactics IV: Combinations
title_section
We now are set to unlock a very useful counting technique, that of counting groups where order is not of
importance. Consider the following example as an illustration of this type of situation.
Example 1.7.1
The Woodlands Mathematica club consists of 20 members in total. A group of three of these members
are to be chosen to compete at the University of Waterloo Mathematics Team Competition. In how
many ways could this group be selected?

At first glance we might instinctively think that 20P3 (i.e. simply arranging 3 of the 20 members) would do
the trick here, but letʼs look at some of the outcomes that would be generated with this operation.
Letʼs consider some of the following six groups generated by this procedure;
1. Tommy, Sachin, Priscilla
2. Tommy, Priscilla, Sachin
3. Sachin, Tommy, Priscilla
4. Sachin, Priscilla, Tommy
5. Priscilla, Sachin, Tommy
6. Priscilla, Tommy, Sachin
It is clear that weʼve exposed a symmetry in that we repeatedly form the same group using differently
ordered selections. In essence, we should realize that we are not concerned with how the group is
selected but rather who will comprise the group. Using methods shown in the previous section we
divide our result by the constant factor by which we can establish the same group, namely that there
are 3! ways to select the same group from the set of 20 overall. Thus our result is:
20 3
𝑃
3!
= 1140
possible selections.
⬛
This situation is common enough to warrant a defined operator, not unlike the permutation operator
introduced earlier, that of a combination defined below.
42
Definition 1.7.1 (Combinations)
A combination of r-objects selected without regard to order from a total of n-objects total is notated by:
𝑃
𝑛 𝑟
𝑛 𝑟
𝐶 = 𝑟!
(Read “n choose r”)
Alternatively, weʼll use the alternate notations:
The use of these alternatives is somewhat inconsistent amongst standard textbooks. Itʼs important to
be reminded that these mean the same thing. Weʼll use the first two formats interchangeably
depending on the aesthetics of the presented solution.
Example 1.7.2
An important contract negotiation is to be held concerning worker safety at a gelatin manufacturing
factory. The workers comprise of 6 machinists, 5 lab-technicians, and 7 machine operators. If the team
assembled must include 6 members overall with equal representation from each group. In how many
ways could the team be chosen?

In this instance we can break down our actions by selecting a group of two people from each type of
worker. This occurs as follows:
Example 1.7.3
In how many ways could a “three-of-a-kind” be formed when dealing five cards from a standard deck?
A “three-of-a-kind” consists of three cards of identical values, while the remaining cards both have
differing values. If we imagine having all 52-cards available to select from we can slow our thoughts
down and think of the process by which we could construct this type of five card hand.
43
⬛
We can see in the above example that combinations were useful in two instances when counting the
actions within a solution. First off, when determining the suit make-up of the three of a kind we did not
care in which order the suits were selected in, for example, if we had selected ʻKingsʼ to be the type of
card tripled up then selecting (King Diamonds, King Spades, and King Clubs) would be the same
outcome as if we had selected (King Spades, Diamonds, and Clubs) etc..., thus combinations allowed for
us to ignore the order of this selection. Secondly, the two single cards were chosen using combinations
as well, namely 12C2 , since we were not concerned which value was selected first or second, thus the
combination alleviated this ordering allowing us to proceed freely.
44
Practice 1.7
Technique
1. In how many ways can five cards be selected from a standard 52-card deck?
2. a) A group of 12 students are to be split into groups of three. The groups are to be assigned
different units to run review activities for the course (e.g. Combinatorics, Probability, etc...). In
how many ways can the groups be chosen?
b) In how many ways can the groups be formed if we cannot have Freid and Terry in the same
group?
3. In how many ways can we make a “full-house” from a standard 52-card deck of cards? A
full-house comprises five cards whereby 3 share the same value and the other two cards also
share an identical value that differs from the triple.
4. Twenty people are to travel in a bus from the airport to the hotel at a resort. The bus is
designed for use in a tropical climate; it can carry twelve passengers outside and eight inside. If
four of the passengers refuse to travel outside and five will not travel inside, in how many ways
can the passengers be seated if the arrangements of passengers inside or outside is not
considered except to take into account these wishes?
5. There are 25 people at a gathering. If everyone at the gathering shakes hands with everyone
else, how many hand-shakes occur during the evening? (Assume that no one shakes their own
hand.)
6. a) Show that 𝐶 =
𝑛 0
𝐶 . Why does this result make intuitive sense?
𝑛 𝑛
b) Show that 𝐶 =
𝑛 1
𝐶
𝑛 𝑛−1
. Why does this result make intuitive sense?
c) Show that 𝐶 =
𝑛 𝑟
𝐶
𝑛 𝑛−𝑟
. Explain in a non-algebraic way why this must be true.
Studies
7. Consider the following ʻAlternate Solutionʼ to example 1.6.3;
In how many ways can we form a “3-of-a-kind” with a standard 52-card deck?
Explain why this argument is false.
45
8. a) In how many ways can we divide up a class of 18 students into 6 equally sized groups? (Note:
This is a hidden symmetry in this problem.)
b) In how many ways can this be done if, once again, Freid and Terry are not to be together?
9. Suppose we have a collection of balls labelled 1, 2, 3, .... , m; for some integer m. If one quarter
of all 3 ball groups contain ball #5, determine the value of m. (NOTE: the result may lead to an
invalid argument; avoid peeking at B.O.B. before determining the answer.)
10. Given a collection of 2n objects, n of which are identical and n of which are all distinct, how
many different subcollections of n objects are possible?
46
1.8 Counting Tactics V: Counting Subsets
title_section
A particularly fascinating problem in combinatorics concerns the number of possible subgroupings (or
subsets) of a collection of n objects (or elements) overall. Letʼs examine this with a more contextual
example.
Example 1.8.1
Ms. AB is having a group of 7 students try out for positions on our local school newspaper. She has
enough positions to allow her the ability to select any number of these 7 students to work with her. If
none of the students prove acceptable she may also choose none of them. In how many ways can Ms.
AB select this group?
Solution: (Cases Approach)

The size of the group selected is unspecified, so weʼll break this problem up based on the size of the
group selected.
(No one selected) C

7 0 =1
(1 Person Selected) C
7 1 =7
(2 People Selected) C
7 2 = 21
7 3 = 35
7 4 = 35
7 5 = 21
7 6 =7
7 7 =1
TOTAL = 128
Thus, there are 128 possible groupings that could be selected.
⬛
Interestingly, the result of 128 happens to be 27. This result is not a coincidence and we exhibit an
alternate argument to highlight why this is so.
47
We examine Ms. ABs choices progressing student by student in this group of 7. Each student may be
selected or not, thus for each student she has two possible outcomes (in or not in the group) leaving us
with the result.
The above example alludes to a more generalized property which is stated below.
Theorem 1.8.1 (Subsets)

Given 𝑛 ∈ ℵ, there are 2n total subsets of a set which consists of n elements.
Proof:
Given in the practice.
⬛
Now letʼs add a wrinkle to the idea of counting subgroupings, whereby some of the objects that can be
chosen are identical.
Example 1.8.2 (Coins in the Pocket)

Xiaoya has 3 pennies, 1 nickel, 2 quarters, 1 Loonie, and 2 Toonies in her pocket. Miko asks her for some
change so that he can buy some lunch. Xiaoya decides to randomly grab some of the coins from her
pocket to lend to Miko. Assuming that sheʼs nice enough to grab at least one coin, how many different
sums of money can she possibly grab out of her pocket?
Solution: (Indirect)
Like the problems earlier, the size of the group is unspecified. Also, since we are only concerned with
the sum of the money selected, a selection of any coin of a particular denomination is indistinguishable
from selecting another one. For example, if Kitty only selects one coin overall and it happens to be a
penny (to Mikoʼs dismay), it would not matter which of the three pennies were selected as they would all
only add one cent to the total. We can, however, employ a similar tactic to that of our alternate solution
shown earlier. Namely, weʼll consider our options per coin denomination as shown;
48
TOTAL 144
BAD no coins are selected 1

GOOD 143
Thus, there are 143 possible sums of money that Xiaoya can share with Miko.
⬛
This reveals yet another result that allows us to account for the number of subgroups that can be
formed when repetitious/identical elements are present.
Theorem 1.8.2 (Counting Subsets containing Identical Elements)

The number of subgroups that can be formed from a total collection of objects which include; a of one
type of object, b of another type, c of yet another type, and so on… is given by
(𝑎 + 1) × (𝑏 + 1) × (𝑐 + 1) ×...
Proof:
Once again, the proof of this result is le to you, but will largely follow the reasoning of the total cases
from the example above.
⬛
49
Practice 1.8
Technique
1. A jelly-bean jar contains 4 red jelly beans, 3 green, 7 yellow, and 5 white. If you grab some
jellybeans from the jar, how many different colour combinations can be selected?
2. Determine all divisors of the number 540. (Hint: Write 540 as a product of prime factors.)
3. Five different signal flags are available to fly on a shipʼs flagpole. How many different signals
can be sent using at least two of the flags assuming that the order in which the flags appear
does not matter?
Studies
4. Determine the number of divisors of the integer 1050 that:
a) even
b) divisible by 5
c) neither prime nor the number 1
5. Miko, still in need of even more money for lunch, asks Gordon for some of his money. Gordon
has 3 nickels, 2 quarters, 3 loonies, and 1 toonie in his pocket. How many different sums of
money can Gordon give to Miko?
6. a) Prove Theorem 1.8.1 (Counting Subsets)

b) Prove Theorem 1.8.2 (Counting Subsets containing Identical Elements)
Repertoire
7. You may have noticed that we carefully selected the amount of each type of coin in the “coins in
the pocket problem” so that the sum of each denomination could not add up to a larger
denomination. In the second study problem, some of the sums could add up to higher
denominations. Generalize the “coins in the pocket” problem for any amount of coin
combinations possible and describe an algorithm or closed form solution (equation) which
solves the problem. Furthermore, how can we generalize this context beyond that of just sums
of money?
50
1.9 Counting Tactics VI: Partitioning
title_section
Our next counting method continues to build upon the techniques weʼve already encountered earlier,
however yet again weʼll be able to expose yet another interesting viewpoint with which to tackle
arrangements with identical objects. Consider the following example;
Example 1.9.1
Ten identical red balls are to be placed into three distinct boxes (labelled A, B, and C). In how many ways
could the balls be distributed such that there is at least one ball in each box?
Before we delve into a solution, letʼs grasp some of the difficulties with this seemingly simple problem.
Intuitively, it would seem that we must satisfy the condition of a ball being placed into each box then
count freely with the remaining 7 balls having three options each (either to be placed in box A, B, or C).
This would lead us to a grievous overcounting of the possible outcomes. To see this, we must recognize
what the outcomes from our counting process must result in. As the balls are identical we are only
concerned with the number of balls in each of the boxes as opposed to which balls go into which box.
Here are some of the possible outcomes;
Box A Box B Box C
1 1 8
4 2 4
5 2 3
... ... ...
Clearly, we could approach this problem with “brute force and ignorance” and systematically label each
individual outcome. One could even envision writing a computer program to easily count through all of
the cases here as well, but this would not be in the spirit of this subject. Instead, weʼll employ an
approach of partitioning the groupings of balls to be placed into each box.

Employing our way of viewing the counting process we can think of the ten balls as lying in a row and
we simply must break off some of them (or partition them) to be placed into ʻbox Aʼ, then partition the
remaining to be placed into ʻbox Bʼ, while the remaining balls will be placed into ʻbox Cʼ. Weʼll illustrate
this with a diagram, highlighting the cases exhibited above;
51
Observing the situation in this manner it then becomes apparent that the number of ways available to
distribute the balls into the boxes would be equivalent to the number of ways we can place the two
partitions. The spaces we may choose from are depicted in the diagram below.
Thus, there are 9C2 = 36 ways to distribute the balls into the boxes.
⬛
As a second example of the power of the partitions approach, letʼs revisit the repertoire problem from
section 1.6;
Example 1.9.2
In how many ways can we line up 6 green bottles and 8 brown bottles such that there is exactly one pair
of green bottles only. The other green bottles must be kept separate.

At the time, this would have been a very tricky problem to work through. Viewing matters using the
partitioning method however will greatly simplify the matter. As the brown bottles have no restrictions
set upon them, we can line them up in any matter whatsoever. We now can break the problem down as
a series of two actions, namely, place 5 green bottles amongst the brown, then select one of the bottles
to be paired up.
Thus, the number of arrangements turns out to be;
⬛
52
Practice 1.9
Technique
1. Determine the number of ways we can distribute 10 identical red balls into 3 distinct boxes,
where empty boxes are allowed.
2. a) Determine the number of solutions to the equation 𝑥 + 𝑦 + 𝑧 = 35; 𝑥, 𝑦, 𝑧 ∈ ℵ.

b) Determine the number of solutions to the above equation if z must even.
3. How many ways are there to distribute four identical oranges and six distinct apples (each a
different variety) into five distinct boxes?
4. How many arrangements of the letters a, e, i, o, u, x, x, x, x, x, x, x, x are there if no two vowels can
be consecutive?
5. How many ways are there to arrange the letters in VISITING with no pair of consecutive Is?
Studies
6. Areya, Ben, Calvin, and James are having a barbecue. They have cooked 12 corn on the cobs, 14
hot dogs, and 10 burgers. If each of them has at least two of each type of food, and by the end
of the night they have consumed all of the food. In how many ways could this have occurred?
7. The 52 cards of a standard deck are laid out in a sequence. In how many ways can this be done if
there are exactly k-runs of hearts? (A run of hearts implies an occurrence of one or more hearts
in a row, for example, HHCDSCCDDHSSCDHHH... would comprise of three runs of hearts thus
far.)
53
1.10 Counting Tactics VII: Pascal’s Method
title_section
One of the more interesting patterns developed in mathematics is that of the array of numbers which
forms Pascalʼs Triangle. The first seven rows of the triangle are displayed below:
The array is simple enough to construct. We first begin with a one at the apex of the triangle, then
progress diagonally down, both le and right, by adding the entries which lay above (see the purple
circle in the diagram above to see how this works).
Oddly enough, the first instance of this array can be dated as far back as the 10th century in India,
however it was Pascal who first wholly investigated many underlying patterns and applications
resulting from the triangle. In this course we will focus on the combinatorial significance of the triangle.
Properties of Pascalʼs Triangle
Weʼll start with some conventions of naming when it comes to the Pascalʼs Triangle which will be
referred to as we move forward.
54
Given this convention, we can examine some properties of the triangle that weʼll prove using
combinatorial reasoning in the practice!
Property 1: (Relationship to Combinations)

Each term of the Pascalʼs Triangle is equivalent to the corresponding combination in the following
manner;
𝑡𝑛, 𝑟 = 𝐶
𝑛 𝑟
Property 2: (Rows of the Pascalʼs Triangle)

The sum of each term in the nth row of the Pascalʼs Triangle is equal to 2n.
Property 3: (Symmetry of the Pascalʼs Triangle)

For each row of the Pascalʼs Triangle we have that;
𝑡𝑛, 𝑟 = 𝑡𝑛, 𝑛 − 𝑟
55
Property 4: (Alternating Sums and Differences of Pascalʼs Triangle)
The nth row of the Pascalʼs Triangle has the property that:
𝑛+1
𝑡𝑛 , 0 + 𝑡𝑛 , 1 − 𝑡𝑛, 2 +... + (− 1) 𝑡𝑛 , 𝑛 = 1; 𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
𝑛+1
𝑡𝑛 , 0 + 𝑡𝑛 , 1 − 𝑡𝑛, 2 +... + (− 1) 𝑡𝑛 , 𝑛 = 0; 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
Property 5: (Diagonals of the Pascalʼs Triangle)

The nth diagonal of the Pascalʼs Triangle correspond to the n-dimensional stacking sequence, as shown
below:
NOTE: The first diagonal would be labelled as the 0th diagonal.
56
For example, the stacking sequence of 2-dimensional triangular numbers is shown below:
while, the stacking sequence of 3-dimensional tetrahedral numbers can be visualized as:
Pascalʼs Method
Returning to combinatorial problems, letʼs use the ideas that generate the array of numbers in the
Pascalʼs Triangle to solve a certain class of situations that frequently arise.
Example 1.10.1
How many paths can be traced out, moving diagonally downward starting from the top row and ending
at the bottom, that spell out the name JoMak in the array shown below?

The cases here are found by simply counting all of the possible paths that lead to each “k" at the
bottom of the array. In order to count these paths, we work downward from the top row of Jʼs. Arriving
at each letter we employ “Pascalʼs Method” of summing the letters diagonally above the particular letter
57
in question. The result is exhibited in the diagram below:
Thus counting the amount of paths leading to each ʻkʼ we get;
2 + 8 + 12 + 8 + 2 = 32
ways to trace out JoMak.

⬛
Example 1.10.2
Cameron wishes to visit his best friend Dylan. Dylan lives three blocks North and 5 blocks East of
Cameron. How many routes could Cameron take to arrive at his friends house provided that he always
travels either North or East.

Since Cameron must always travel 3 blocks North and 5 blocks East, we can think of his chosen path as
a series of 8 moves that must include an arrangement of 3 Ns (moves North) and 5 Es (moves East),
yielding;
8!
3! 5!
= 56
ways of arriving at Dylanʼs place.

Using a similar reasoning with Cameron making 8 moves of which 3 must be North (or 5 moves East); we
have that we must choose which three of the eight moves (i.e. first, second, etc.) will be made to the
North (5 moves to the East) giving us;
𝐶 = 56
8 3
or 𝐶 = 56
8 5
⬛
58
Alternate Solution 2: (Direct Approach; using Pascalʼs Method)
Drawing out the possible pathways on a 3x5 grid; we can use the Pascalʼs Method of counting pathways
to determine the number of total paths as shown below;
Thus, there are a total of 56 paths that Cameron could take to reach Dylanʼs house.
⬛
Practice 1.10
Technique
1. Determine the number of pathways that spell TIFFANY by tracing along diagonally adjacent
letters in the array below:
T T T T T
I I I I
F F F F F
F F F F
A A A
N N N N
Y Y Y Y Y
59
2. Determine the number of pathways that spell RACECAR by tracing along diagonally adjacent
letters in the array below (be mindful of possible directions):
R R R R
A A A
C C
C C
A A A
R R R R
3. Cora wishes to visit her friend Selina. If Cora only travels on defined paths as shown in the
diagram below, determine the number of routes she could take to visit Selina assuming that she
avoids any path that would be considered as backtracking.
4. A spider and a fly are situated at opposite corners of a 6x6 grid. They head toward their
respective opposite corners at the exact same rate. In how many ways could they meet for
“lunch”?
60
Studies
5. Devise combinatorial arguments (i.e. non-algebraic arguments) to prove:
Property 1 (Relationship to Combinations); 𝑡𝑛, 𝑟 = 𝐶
𝑛 𝑟
.
𝑛
𝑛
Property 2 (Rows of the Pascalʼs Triangle); ∑ 𝑡𝑛, 𝑟 = 2 .
𝑟=0
Property 3 (Symmetry of Pascalʼs Triangle); 𝑡𝑛, 𝑟 = 𝑡𝑛, 𝑛 − 𝑟.
6. Suppose we have 5 sets of brackets (i.e. 5 le brackets and 5 right brackets). How many
arrangements of the brackets make “logical” sense? For example, ʻ(())()(())ʼ would make logical
sense while ʻ)((())(())ʼ would not.
7. The fountain below consists of an array of containers each with a capacity of 2 litres. Water is
poured into the top container. Once this container overflows, the water will start pouring onto
the next level and once the containers at this level are filled they start pouring onto the third
level and so on... How much water must be poured in total to fill up one of the containers in the
bottom row? (Weʼll ignore the lack of a third dimension here to keep things a little simpler.)
Repertoire
8. Recall the Binomial Theorem which states: For constants 𝑎, 𝑏 ∈ ℜ, and 𝑛 ∈ ℵ we have that;
Prove the binomial theorem using a combinatorial argument.
61
9. Prove the remaining properties of the Pascalʼs Triangle discussed in this section, namely:
Property 4 (Alternating Sums and Differences of Pascalʼs Triangle), and
Property 5 (Diagonals of Pascalʼs Triangle)
10. Investigate, conjecture and prove another property of Pascalʼs Triangle. Try to avoid “looking
up” other properties. There are numerous opportunities to seek out patterns in this array;
consider this as practice for discovering and proving new results!
62
1.11 Challenge Problems
title_section
The following set of problems can be solved using combinatorial reasoning. Enjoy the challenge!
1. How many arrangements are there of REVISITED are there with vowels not in increasing order
and no consecutives Es and no consecutive Is?
2. n-points are distributed around the circumference of a circle such that if all points are joined by
chords, no three chords will intersect at a single point. Determine the number of intersections
within the circumference.
3. How many subsets of six integers chosen (without repetition) from 1, 2, …, 20 are there with no
consecutive integers (e.g. if 5 is in the subset then 4 and 6 cannot be in it)?
4. How many subsets of three different integers between 1 and 90 inclusive are there whose sum is
divisible by 3?
5. How many numbers less than 10000 are divisible (divide evenly) by either 2, 3, or 7? Use a
combinatorial argument (i.e. brute force and ignorance BFI wonʼt be given full credit as a
solution)?
6. Five people are sitting around a round table. Let w represent the number of people sitting next
to at least one woman and m represent the number of people sitting next to at least one man.
How many possible values of the ordered pair (w, m) are there?
7. Mack the Millipede starts at (0, 0, 0) at noon and each minute moves one unit in either the
positive x-direction, the positive y-direction, or the positive z-direction. Thus, a er 1 minute he
could be at (1, 0, 0), (0, 1, 0), or (0, 0, 1); a er two minutes, he could be at (2, 0, 0), (0, 2, 0),
(0, 0, 2), (1, 1, 0), (1, 0, 1), or (0, 1, 1). How many different paths could he take to (3, 5, 2) which
donʼt pass through (1, 3, 1)?
8. An 8x8 chessboard has alternating black and white squares, as shown to

the right. How many distinct rectangles, with sides on the grid lines of
the chessboard and containing at least 4 black squares, can be drawn on
the chessboard?
9. Consider a generalization of the “brackets” problem from Section 1.10, namely, given n pairs of
brackets, determine how many ordering of the n-open brackets and n-closed brackets are in a
logical order. Justify why 2nCn - 1 /n solves this problem.
63
Unit 2: Probabilistic
Reasoning
main_section
I will never believe that God plays dice with the universe.
Albert Einstein
64
2.1 The Probability of a Random Event
title_section
The Ethos of Probability Theory
In the world of mathematics, probability theory is relatively young. Many of us think of likelihoods in a
very intuitive way, for example in many cultural traditions the family history of a prospective suitor was
observed for the likelihood of desirable physical, mental, and personality traits. Modern genetics has a
significant grounding in probabilistic reasoning as mass amounts of research into the nature of how
genes and traits are passed on have dominated the medical field for the past century. The study of the
quantum nature of matter is largely based on probabilistic results, causing the great scientist Albert
Einstein to make his quip found at the onset of this chapter as he felt that the dynamics of a universe
created by a perfect being could never be governed by chance events.
At the heart of probability theory is the concept of the randomness of an event. It is highly debatable
whether or not true randomness truly exists in our universe. Let us consider the example of a ʻcoin flipʼ.
1
We are comfortable with the idea of tossing “heads” with a fair coin to be equivalent to 2
or 50%. This
clearly makes intuitive sense as there are two possible results and only one of them is construed to be
“heads” as it were. If however, we were privy to knowing the exact force exerted to the coin, the mass
and rotational forces coupled with gravitational forces, friction of the air, etc... it would seem that we
could, in fact, calculate which way the coin would fall. So, in a sense, the event of this coin toss is not
truly random, but rather when considering all of the subtle dynamics involved with this action we
simply smooth out and balance all of these possibilities under the guise of “randomness”. Thus,
probability theory is a foray into a very convenient and effective method of approximation for
determining the likelihood of a certain event.
The Probability of a Random Event
Before formally defining anything, letʼs look at a fairly trivial example to motivate our understanding.
Example 2.1.1
The names of five students are placed into a bag; Aakash, Andrew, Bogdan, Cherry, and Dejan. One
name is drawn from the bag. What is the probability that the name selected begins with the letter ʻAʼ ?
Solution:
As there are two names of the five that begin with the letter ʻAʼ; it would imply that the probability of
2
selecting such a name would be 5
.
⬛
Well that seemed a bit of a waste!! But letʼs break down this simplistic problem to highlight the
vocabulary that weʼll be using throughout this unit and beyond:
65
Definition 2.1.1 (Outcome Space)
The set of outcomes that can possibly occur is referred to as the outcome space. Thinking of the
outcome space as a set we usually denoted it with the letter S.
E.g. The five names in the previous example S = {Aakash, Andrew, Bogdan, Cherry, Dejan} constituted
the outcome space of the event.
Definition 2.1.2 (Event Spaces and Random Variables)

The set of outcomes that we wish to obtain is referred to as the event space. This set is o en denoted
by a letter which is representative of what is being described, for example, ʻAʼ would be appropriate in
the above example with A = {Aakash, Andrew}.
The set of all outcomes that comprise an event space is called a random variable. It is key to
understand that random variables are sets! In our example, the set A can be considered a random
variable.
Definition 2.1.3 (Probability)

The probability of an outcome in event space A, is defined to be the proportion of the cardinality of the
event A as compared to the cardinality of the outcome space S. Mathematically this proportion is
denoted by P(A) and is given by;
𝑛(𝐴)
𝑃(𝐴) = 𝑛(𝑆)
So at this point you may be cringing, having thought the challenges of combinatorics were le behind,
only to realize that the probability of an event can be simply reduced to two counting problems! But
letʼs not think this way, instead weʼll relish the future challenges that such problems can bring and rely
on our freshly developed combinatorial intuition to take charge and guide us through. In fact, many of
the counting methods introduced in the previous chapter will carry through in a natural way when
asked to determine the probability of a certain event, and thus life should not be so torturous a er all.
To start off, letʼs observe some consequences of our definition:
Property 1: (Range of Probabilities)

Given an outcome space S. The probability of the event S, i.e. the likelihood of any outcome occurring is
given by,
𝑛(𝑆)
𝑃(𝑆) = 𝑛(𝑆)
= 1
and
The probability of no outcomes occurring, or set theoretically the chances of the ∅ (empty set)
occurring is;
66
Thus, the likelihood of any event, A, is bounded by;
0 ≤ 𝑃(𝐴) ≤ 1
Property 2: (Probability of Complements)
Given an event A and its complement Ac we have the following:
𝑐
𝑛(𝐴) 𝑛(𝐴 )
= 𝑛(𝑆)
+ 𝑛(𝑆)
𝑐
= 𝑃(𝐴) + 𝑃(𝐴 ) = 1
Thus,
𝑐
𝑃(𝐴) + 𝑃(𝐴 ) = 1
𝑐
or 𝑃(𝐴 ) = 1 − 𝑃(𝐴)
Thus, if itʼs easier to determine the probability of an undesirable event we simply have to subtract this
result from 1 to obtain the chances of the desired event; this will mark the equivalent to the Indirect
Approach to counting problems.
Weʼll see later on that the direct and cases approaches translate seamlessly as well. Prove them for
yourself! But we wouldnʼt want to waste all of the excitement in this first section alone.
67
Example 2.1.2
A fair coin is tossed three times. What is the probability that:
a) two heads are tossed?
b) at least one head is tossed?
Solution:
a) Let H represent the number of heads tossed. Thus, 𝐻 = {0, 1, 2, 3} since we can toss anywhere
between zero to three heads during this process.
𝑃(𝐻 = 2) Side work:

It is o en helpful to work on the combinatorial calculations off to
the side to ensure we remained focussed on the desired
𝑛(𝐻=2) probability calculation to the le .
= 𝑛(𝑆)
n(S): (Direct Approach)
3
= 8
n (H = 2): (Direct Approach)
It is instructive at this point to note that we do not always have to simply use random variables
for the event spaces when expressing probabilities, in the solution above, we could simply
describe the event in question as “P( 2 heads)” to help make the solution clear.
b) (Indirect Approach)
Since we wish to observe the likelihood of at least one head being tossed, we can instead seek to
calculate the probability that no heads were tossed (complementary set). As there is only one way to
have no heads (i.e. only tails were tossed) we have,
1
𝑃(𝑛𝑜 ℎ𝑒𝑎𝑑𝑠) = 𝑃(𝐻 = 0) = 8
Thus,
68
1 7
𝑃(𝐻 > 0) = 1 − 𝑃(𝐻 = 0) = 1 − 8
= 8
⬛
Practice 2.1
Technique
1. Two fair six-sided dice are rolled. What is the probability that
a) their sum is even?
b) doubles are rolled?
c) a sum of four or less is rolled?
2. An integer from 1 to 50 inclusive is chosen at random. What is the probability that the integer
a) is divisible by 11?
b) not a multiple of 5?
3. An infant typed three strokes on a keyboard. If all of the characters that were struck were letters
of the alphabet, what is the probability that the characters were consecutive and in alphabetical
order? (Assume that Z, A, B would not count as consecutive… i.e. no “wrap-arounds”.)
4. A group of 12 people are going out to a concert on Saturday night. The group will take three
cars with four people in each car. If they distribute themselves at random, what is the
probability that Elim and Vimalan will be in the same car (NOTE: we wonʼt consider the cars as
“distinctive”, just concern yourself with who are sitting together.)?
5. What is the probability of being dealt a “full house” from a deal of five cards? (Note: A “full
house” constitutes three cards of one type of value and two cards that share a different value
from the triple; e.g. KHKDKS2C2S.)
6. In the 6/49 lottery six different numbers must be selected between 1 through 49 inclusive. To
win the jackpot your six numbers must match the six numbers that are drawn from the lottery
drum. What is the probability of winning the jackpot?
Studies
7. In a class of 20 students what is the probability that at least two of them share the same
birthday?
8. A bag is filled with a certain number of balls of varying colours. The probability that five balls,
1
chosen at random, are all green is 2
. Determine the minimum amount of balls that the bag can
contain.
9. Explain why the order of the outcome of two dice must be considered when determining
likelihoods of outcomes even though the dice may appear physically identical.
69
2.2 The Cases Approach for Probability
title_section
Recall that for two sets A and B, the principle of inclusion and exclusion tells us that;
dividing both sides of this equation by the cardinality of the outcome space S we get,
giving us;
This means that we can utilize the tactic of breaking a problem up into Cases just as we did with
combinatorial problems!
Mutually Exclusive Events
Definition 2.2.1
Two events A and B, are called mutually exclusive if they share no elements in common meaning that
𝑛(𝐴 ∩ 𝐵) =∅. In this case we have that;
Example 2.2.1
What is the probability of rolling either a 3 or 5 with a single die?

As the event of rolling a three or five have no overlapping outcomes we can consider the two events
separately giving us;
Case Probability
P(roll 3) 1
6
P(roll 5) 1
6
70
P(roll 3 or 5) 1 1 1
6
+ 6
= 3
⬛
Example 2.2.2
There are 10 red balls and 15 black balls in a box. If seven balls are selected at random from the box
what is the probability that at least 6 will be black?

Let B represent the number of black balls selected; thus B = {0, 1, …, 6, 7}
We wish to determine 𝑃(𝐵 ≥ 6) and since each outcome is mutually exclusive we can determine the
probability of each case separately;
Case 1: P(B = 6)
If we have six black balls then in turn we must have also selected one red ball, thus our event space is;
while the outcome space is;
Therefore we get;
Case 2: P(B = 7)
Using similar reasoning we get;
Hence, we have overall;
71
Overlapping Cases: Non-Mutually Exclusive Events
Continuing on with the theme of parallel reasoning between combinatorial based problems and
probabilistic ones, weʼll observe an example which highlights overlapping cases.
Example 2.2.3
100 Woodlanders were interviewed concerning their content creation on online media. The students
were asked if they had posted content on either Instagram (I), TikTok (T), or YouTube (YT) in the past
year. The responses were as follows:
74 posted on Instagram,
38 on TikTok,
32 on YouTube,
25 on both Instagram and TikTok,
18 on both TikTok and YouTube,
22 on Instagram and YouTube, and
7 on all three.
Determine the number of students who:
a) posted on at least one of these sites.
b) did not post on any of these sites.
c) posted only on YouTube.
Solution:
Just as with overlapping cases before, we can approach such problems either algebraically by using the
Principle of Inclusion and Exclusion, or graphically using Venn diagrams. Weʼll select the most
appropriate to each situation.
a) (Cases Approach; Inclusion/Exclusion)

𝑃(𝐼 ∪ 𝑇 ∪ 𝑌) = 𝑃(𝐼) + 𝑃(𝑇) + 𝑃(𝑌) − 𝑃(𝐼 ∩ 𝑇) − 𝑃(𝑇 ∩ 𝑌) − 𝑃(𝐼 ∩ 𝑌) + 𝑃(𝐼 ∩ 𝑇 ∩ 𝑌)
74 38 32 25 18 22 7
= 100
+ 100
+ 100
− 100
− 100
− 100
+ 100
86
= 100
= 0. 86
Thus, 86% of the respondents have posted on one of these sites in the past year.
To determine the amount of students who did not post, we recognize that this is simply the
complement of the previous set giving us;
𝑃(𝑛𝑜𝑛𝑒) = 1 − 𝑃(𝐼 ∪ 𝑇 ∪ 𝑌)
= 1 − 0. 86
= 0. 14
Thus, about 14% did not post on any type of online media in the past year.
72
c) (Cases; Venn Diagram)
Weʼll construct a Venn diagram to illustrate the probabilities of each region aiming to determine
how many students posted on YouTube only. Remember that the key in this strategy is to work
from the inside out.
Thus, no one surveyed had posted on YouTube only.

⬛
73
Practice 2.2
Technique
1. In a certain population, 10% of the people are rich, 5% are famous, and 3% are rich and famous.
If a person is selected at random, what is the probability that the person is rich but not famous?
2. In a town of 351 adults, every adult owns a car, motorcycle, or both. If 331 adults own cars and
45 adults own motorcycles, what proportion of the car owners do not own a motorcycle?
3. There are six types of tickets placed in a box yielding different prize amounts; $1, $10, $100,
$1000, $10 000, and $100 000. The proportional amount of each ticket type is 60%, 30%, 4%,
3%, 2%, and 1% respectively. What is the probability that you select a ticket with a prize
amount of at least $1000?
4. An oddly shaped die has the property that the chances that you roll a 2, 3, 4, or 5 is equivalent to
the chances of rolling a 1 or 6. What is the probability of rolling a 3 or greater with this die?
5. There are 20 students participating in an a er-school program offering classes in yoga, bridge,
and painting. Each student must take at least one of these three classes, but may take two or all
three. There are 10 students taking yoga, 13 taking bridge, and 9 taking painting. There are 9
students taking exactly two classes. What proportion of the students are taking all three of the
classes?
Studies
6. A spinner is designed with 7 slots such that;
𝑃(1) = 𝑃(2) = 𝑃(6),
1
𝑃(3) = 2𝑃(4) = 2
𝑃(1), and
1 1
𝑃(5) = 2
𝑃(7) = 4
𝑃(1).
Determine the likelihood of each sector of the spinner.
74
2.3 Probabilities of Successive Events
title_section
The Direct Approach Returns!
One of the primary counting tactics weʼve consistently employed was that of breaking down our
problem into a series of “successive” actions whereby we could invoke the Fundamental Counting
Principle and simply multiply the number of outcomes within each action to obtain our overall result.
To shorten the names of things weʼve adopted the phrase direct approach when employing this
counting strategy. As it turns out, the same principle applies to probability as well, in that if we can
break down a probabilistic problem into a series of successive actions, then we may simply multiply the
corresponding probabilities to obtain the overall likelihood of the event. But as, “seeing is believing,”
letʼs observe some examples.
Example 2.3.1
Determine the probability that a group of three people do not share the same birth month.

Using combinatorics to work through this problem we realize that in total each person has twelve
options for their birth month to be assigned to them, while we must assign separate months for each
person to ensure that they are not born on the same month, yielding;
𝑛(𝑛𝑜 𝑜𝑛𝑒)
𝑃(𝑛𝑜 𝑜𝑛𝑒) = 𝑛(𝑡𝑜𝑡𝑎𝑙)
𝑃
12 3
= 3
12
= 0. 764
⬛
Now letʼs consider this in another light.

Weʼll name the three individuals Intisar, Janet, and Karan. Then considering the likelihoods per
individual we get the following results; for Intisar he can be allocated any of the 12 months and so the
12
probability would be 12
= 1, then Janet can only be allocated any 11 of the twelve, followed by Karan
being allotted any 10 of the remaining months. This shows up as a series of successive action as follows:
⬛
75
This method breaking up events opens the door for a lot of problems to be solved in a much easier
manner than simply working through the combinatorial processes.
Example 2.3.2
Atharva and Morgan decide to engage in a three game Bey-Blade (fancy spinning tops) battle. In the
past, Atharvaʼs Bey-Blade has won at a rate of 84% of the time against Morganʼs blade. Determine the
probability that:
a) Atharva wins all three battles.
b) Morgan wins at least one battle.
Solution:
a) (Direct Approach)
Let A represent the number of times Atharva wins; thus 𝐴 = {0, 1, 2, 3}.
Weʼll assume that Atharvaʼs blade will have an 84% chance of winning every time they battle,
thus we can break the probabilities up by each match up as follows;
If Morgan wins at least one battle, that means that Atharva doesnʼt win all three. Thus, given
that weʼve already calculated the complement we get;
𝑃(𝐴 < 3) = 1 − 𝑃(𝐴 = 3)
= 1 − 0. 59
= 0. 41
⬛
76
Practice 2.3
Technique
1. Vicky2s is a hip new restaurant that offers 7 distinct entrees. If four friends decide to dine at
Vicky2s and they all select their entree independently, what is the probability that none of them
will have the same meal?
2. If you are dealt two cards from a standard deck, what is the probability that you will be given 2
Aces?
3. Determine the probability of being dealt a flush with 5 cards drawn from a standard deck.
(Note: A flush indicates that all five cards share the same suit.)
4. The letters that spell PROBABILITY are in a bag. What is the probability that Barb is able to pull
out the four letters that spell her name in perfect order?
5. A pair of dice are rolled repeatedly until doubles are achieved. What is the probability that a
pair of doubles will be rolled, for the first time, on the fourth toss?
77
2.4 Dependent and Independent Events
title_section
Classifying Successive Actions
Having that the Fundamental Principle of Counting (a.k.a. the Direct Approach) still applies to
probabilistic situations, we can now grapple with some more interesting scenarios. But first weʼll
introduce some terminology;
Definition 2.4.1
Two events, A and B, are said to be independent if and only if 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵).
If youʼre thinking, “OK, whatʼs that supposed to mean?” then youʼve asked yourself a very good
question! Observing the definition from a set theoretic point of view, all we can derive is that the
likelihood of the intersection of two sets are deemed independent when we can determine this
probability by multiplying the respective probabilities of each set taken separately. This is all well and
good but letʼs discuss this phenomenon in a different perspective to hopefully give us a better sense of
the matter.
A more holistic way of viewing this definition is to state it as follows;
Definition 2.4.1 (Informal..more Intuitive Version)

Sets A and B are considered independent when the event that both A and B occur can be broken down
as a sequence of successive events whereby the result of A has no bearing on the likelihood that B occurs.
Some contextual examples would be,

I. Repeated tossing of a coin. In this case any previous toss has no bearing on the result of any
subsequent toss, thus these events are independent.
II. Repeated rolling of a die. Again, the outcome of any particular roll is unaffected by any previous
roll.
III. Any spin on a slot machine is unaffected by previous results. (Note: This is in direct
contradiction to most gambling addicts beliefs that a machine that has not won for many spins
has a greater likelihood of winning in the near future.)
In a similar manner we can now define the notion of events whose probabilities are not independent of
previous results, namely;
Definition 2.4.2
Two events A and B are considered dependent if; 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵| 𝐴).
NOTE: 𝑃(𝐵 | 𝐴)means the likelihood that B occurs given that A has occurred.
78
Examples of dependent events would be;
I. Drawing an Ace on your second selection from a standard deck of cards, where the first card was
not replaced. In this case, the card selected first will have an impact toward the likelihood of
obtaining an Ace on the next card.
II. Success on an exam in relation to the amount of preparation done. Many may also argue, that
adequate preparation time is also dependent upon oneʼs own ability for the subject matter
being studied, nevertheless there is some level of dependence between these two events.
The key notion with both of these definitions is that the Fundamental Principle of Counting is being
applied here. Thus, when observing the chances that both events A and B occur, and this problem can
be broken down as a sequence of successive events, then we can multiply the respective probabilities.
The idea of the events being independent versus dependent is simply a matter of recognizing the
relationship, if any, between the results of these events.
Probability Trees
The strategy of drawing a tree diagram to solve counting problems was introduced in our very first
example, but as we saw then, as the number of outcomes grew, this method became somewhat
inefficient. The power of this technique however can help us greatly when solving probabilistic
problems. Letʼs observe a basic example to illustrate this;
Example 2.4.1
Yamama and Mimi participate in a friendly three game match of Chess. In the past, Yamama wins about
70% of the time when they play each other. What is the likelihood that Mimi upsets Yamama (i.e. that
Mimi wins at least two of the three games)?

It is fairly easy to recognize that there are two distinct cases of concern here, namely, when Mimi wins
two games or when winning three. Drawing out the possibilities we get,
79
We are implicitly assuming here that the outcome of each game is independent of any subsequent
game, thus Yamama will retain her 70% advantage each round. The tree is constructed so that each
event, namely the individual games, represent a new layer of the tree. The outcome of each game can
only result in either Yamama winning or Mimi winning.
We also can see that every possibility of how the three game series can play out is exhibited in this
diagram. To do this, we can along the branches that yield Mimi winning two or three games as shown
below;
Tracing along the branches that have Mimi winning at least two times, we are able to map out each
desired case in our solution. As the events are successive, we can multiply the probabilities along each
branch as shown in the diagram above.
Hence, we get;
𝑃(𝑀 ≥ 2) = 𝑃(𝑀𝑀𝑀) + 𝑃(𝑀𝑀𝑌) + 𝑃(𝑀𝑌𝑀) + 𝑃(𝑌𝑀𝑀)
3 2 2
= (0. 3) + (0. 3) (0. 7) + (0. 3)(0. 7)(0. 3) + (0. 7)(0. 3)
= 0. 217
Thus, Mimi only has a 21.7% chance of winning this three match set.
⬛
80
Example 2.4.2 (Medical/Psychological Testing)
In an analysis of psychological cases performed in 2010 it was found that about 1 in every 200 people
exhibited behaviours associated with Morganʼs Syndrome, a character trait whereby one is able to delay
emotional responses to highly intensive situations. Such people are considered exceptional at working
within extreme situations such as military field tactics, hostage negotiations, emergency surgeons, etc...
Because of this a test was designed to measure whether or not someone exhibits the Morgan traits. It
has been shown that this test shows up as positive 98% of the time when the person actually has the
character trait, while it yields a false positive (i.e. the test shows positive when the person does not
actually have the trait) approximately 5% of the time. If the test is administered to a randomly chosen
person from the public, what is the probability that the test is correct?

We notice that there are two distinct events occurring in this problem. The first is that the person being
tested will either have the trait or not, while the second is that of the administering of the test for
Morganʼs Syndrome. We can further observe that these events are dependent as the test result clearly
depends on whether or not the subject has Morganʼs Syndrome or not. This again can be viewed as a
probability tree as shown;
We further notice that the test yields a “correct” result in one of two distinct cases, namely when the test
shows positive and the person has Morganʼs or when it shows negative when the person does not have
Morganʼs as indicated above. This gives us;
𝑃(𝐶𝑜𝑟𝑟𝑒𝑐𝑡) = 𝑃(𝑌𝑎𝑛𝑑 +) + 𝑃(𝑁 𝑎𝑛𝑑 −)

= ( )(0. 98) + ( )(0. 95)
1
200
199
200
= 0. 95
Hence, the test is “correct” about 95% of the time.
⬛
81
Practice 2.4
Technique
1. Atharva estimates that the probability of getting the next question right if the previous one was
4
right on a Probability test is 5
. But the chance of getting it right if the previous one was wrong
2
is only 5
. Assuming that it is equally likely to get the first question correct, determine the
probability of getting;
a) the second question correct.
b) the third question correct.
2. Victoria and Edward are evenly matched tennis players. However each time Victoria loses a set
1
her probability of winning the next game is decreased by 5
. But when she wins, her probability
1
of winning the next set increases by 10
.
a) Draw a probability tree that illustrates a 3-set match between these two players.
b) Determine the probability that Victoria wins at least two of the three sets.
3. If Victor is late for his Data class, he makes a greater effort to arrive on time for the next class and
3
the probability that he is on time is 4
. However if he is on time, he is liable to be less concerned
1
about punctuality for the next class and his probability of being on time drops to 2
. If Victor
was on time on Monday, determine the probability that he would be late on Wednesday.
4. A drawer contains 4 red and 3 black socks. If three socks are drawn randomly without
replacement. Determine the probability that at least two of them are red socks.
Studies
1
5. 1600 ants start climbing a tree as shown in the diagram below. At each branch 4
of the ants go
3
le and 4
of the ants go right. How many ants reach the point W? How many reach the point
V?
82
6. A cereal company is providing a free game app giveaway to promote better sales. Each box
contains one of five game apps; Angry Birds, Bit.Trip Beat, Cut the Rope, DDR, or Enduro.
Furthermore, each of the five games are equally likely to appear in any particular box. If you
were to purchase 12 cereal boxes, what is the probability that you would obtain all five games?
Repertoire
7. The game of craps is a dice based game with the following rules:
● On the first roll of the two dice, you win if the roll is 7 or 11; you lose if the roll is a 2, 3, or
12; and any other roll is called your point.
● If on the first roll you rolled a point, then you must continue to roll the dice as many
times as necessary until you roll either a 7 or 11, in which case you lose, or your point, in
which case you win.
What is the probability of winning at this game?
8. Cheng, Hank, and Jim are involved in a three cornered duel. They each take shots at each other
in successive turns starting with Cheng, then Hank, then Jim, and back to Cheng (if possible),
and so on... Once a person is shot they are out of the duel altogether (Iʼd say theyʼre dead but
this is possibly a little too graphic for young school children like yourself). Anyway, Cheng
shoots with an accuracy level of 0.6, Hank is a little better being able to hit his mark 85% of the
time, while Jim never misses! What is the probability that Cheng wins/survives the duel? (Note:
You may assume that each duelist is perfectly rational when deciding whom to aim at.)
9. Determine the conditions whereby the order of outcomes is an unimportant consideration.

That is, if one considers order or doesnʼt they will obtain the same probability using either
approach.
83
2.5 Conditional Probabilities
title_section
The definition of two dependent events A and B yielded the relationship;
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) · 𝑃(𝐵|𝐴)
where 𝑃(𝐵|𝐴) is defined to be the probability that B occurs given that the event A has occurred. Letʼs
inspect this situation a little more closely. Observing the diagram below we see the sets A and B sitting
within our outcome space S,
Being privy to the knowledge that A has occurred we now can restrict the outcome space to being the
set A only. Furthermore, the only outcomes in the set B that are now relevant reside in the set 𝐴 ∩ 𝐵.
This leads us to the following relationship;
𝑛(𝐴∩𝐵)
𝑃(𝐵 | 𝐴) = 𝑛(𝐴)
It should be noted that there is nothing really unique about this formula. We are still ultimately adhering
𝑛(𝐴)
to the original defined relationship for probability, namely that 𝑃(𝐴) = 𝑛(𝑆) with the exception that
there are now restrictions on both our event space and outcome space.
Moreover, if we divide both the numerator and denominator by 𝑛(𝑆)we get;
𝑛(𝐴∩𝐵)/𝑛(𝑆)
𝑃(𝐵 | 𝐴) = 𝑛(𝐴)/𝑛(𝑆)
𝑃(𝐴∩𝐵)
𝑃(𝐵 | 𝐴) = 𝑃(𝐴)
You may also notice that this relationship is just a rearrangement of the original definition for
dependent events A and B. So now we essentially have two ways of working with probabilities with
given information; either by counting outcomes or by considering the likelihoods of the said
information.
84
Example 2.5.1
In 2018 a U.K. study by the Human Rights and Equity Commission determined that approximately 0.4%
of respondents identified as non-binary, 49.6% male, and 50% female. If a family is chosen at random
from the set of all families with exactly two children, determine the probability that the family has two
non-binary children given that at least one of the children is non-binary.

Weʼll begin by sketching a probability tree to visualize the possible outcomes.
Since weʼre given that one of the children is non-binary, our outcome space is now restricted to the
paths indicated in green in the diagram above. Thus, we can effectively cut out the non-highlighted
branches. Consequently, we now can work through the calculation as follows;
𝑃(2𝑁 ∩ 𝑁≥1)
𝑃(2𝑁 | 𝑁 ≥ 1) = 𝑃(𝑁≥1)
2
(0.004)
= 2
(0.496)(0.004) + (0.004)(0.496) + (0.004) + (0.004)(0.5) + (0.5)(0.004)
= 0. 002
Thus, there is a 0.2% that such a family would have two non-binary children.
⬛
It is instructive to note to remain steady to the circumstances dictated by the given information when
working through such problems. There are many instances where the result will be non-intuitive as will
be demonstrated in the next example.
85
Example 2.5.2 (Medical/Psych Testing Revisited)
Devon has himself tested for Morganʼs Syndrome (as outlined in Example 2.4.2) and the result showed
positive. What is the probability that Devon actually has Morganʼs Syndrome?
Solution:
Letʼs show our diagram again of the testing scenarios for Morganʼs Syndrome but this time realizing that
some of our outcomes, namely the oneʼs where the test result is negative are now restricted.
So, once again, knowing that Devon obtained a positive result, we determine the likelihood that he
actually has Morganʼs by utilizing the conditional probability formula as shown below;
𝑃(𝑀 ∩ +)
𝑃(𝑀 | +) = 𝑃(+)
=
( )(0.98)
1
200
( )(0.98)+( )(0.05)
1
200
199
200
= 0. 09
Thus, the likelihood that Devon has Morganʼs syndrome is only 9% !?!
⬛
This result is highly astonishing knowing that the screening test seems to be fairly reliable. What o en
goes unnoticed is that the proportion of people in the population having this trait is very small.
Moreover, if someone is being tested for some type of medical condition they will o en be exhibiting
traits/symptoms which may be indicative of the condition. Our solution presumes that Devon is just
some random person selected from the population as a whole. In this light we once again can see that
our instincts can easily be misguided.
86
Practice 2.5
Technique
1. A single die is rolled. What is the probability that it is a 5, given that it is greater than 3?
2. A card is drawn from a standard deck. What is the probability that it is a Jack, given that it is a
face card (Jack, Queen, or King)?
3. A bag contains 6 red and 4 green marbles. Three are drawn randomly without replacement.
What is the probability that the third is green given that at least one of the first two is green?
4. A small factory has three machines M1, M2, and M3 for producing protractors. The high speed
machine (M1) produces 60% of the protractors but 5% of the output is defective. The medium
speed machine (M2) produces 30% of the protractors of which 3% are defective. The low speed
machine (M3), which has a defect rate of 1%, produces the remainder. If a protractor is
defective, what is the probability that it came from M2?
5. Two brands of headache medicine are on the market; Acetylin and Salicin. One in 400 people
taking Acetylin suffers side-effects and one in 1200 taking Salicin. At the present time it is
estimated that equal numbers of people take each kind of drug. If Acetylin is taken off the
market due to industrial sabotage, show that the probability of side-effects will be halved.
Studies
6. There are three cards in a box. One card is red on each side, one green on each side, and the
third has one red side and one green side. If a card is drawn randomly from the box and placed
on a table so that the side facing downward cannot be seen, and is showing red on the side
facing upward, what is the probability that the down side is also red?
7. There are four cards in a box. One card is red on both sides. The other cards are each red on
one side and green on the other. A card is drawn randomly from the box and placed on the table
so that the side facing downward cannot be seen. If the side facing up is red, what is the
probability that the down side is also red?
𝑐
8. Show that events A and B are independent if and only if 𝑃(𝐴 | 𝐵) = 𝑃(𝐴 | 𝐵 ). (Remember that
the phrase if and only if implies that you have to first assume the independence of A and B and
prove the relationship, then assume the relationship is true and prove that the events are
independent.)
87
9. Suppose there are three similar boxes. Box i contains i white balls and one black ball, the cases
for i = 1, 2, 3 are shown in the diagram below.
Suppose the boxes are mixed up and one is then chosen at random. Then a ball is selected at
random from the box and it is shown to you. If you guess which box the ball came from youʼll
win a prize. Which box would you guess if the ball drawn is white and what is your chance of
guessing right?
Repertoire
10. (Polyaʼs Urn Scheme) An urn contains 4 white balls and 6 black balls. A ball is chosen at
random, and its colour is noted. The ball is then replaced, along with three more balls of the
same colour (so that there are now 13 balls in the urn). Then another ball is drawn at random
from the urn.
a) Determine the chance that the second ball drawn is white.
b) Given that the second ball drawn is white, what is the probability that the first ball drawn is
black?
c) Suppose the original contents of the urn are w white and b black balls, and that a er a ball is
drawn it is replaced along with d more balls of the same colour. What is the probability that the
second ball drawn is white?
88
2.6 Using Simulations to Estimate Probabilities
title_section
Hitting the Combinatorial Wall
Up until this point weʼve focussed on the theoretical determination of the likelihood of a random event
A. By now you should be keenly aware that, combinatorially speaking, this is not always an easy task.
Secondly, in our initial discussion on the matter of defining a “random” event we realize that the
implicit assumption being made is that we are effectively assuming that all of the subtleties of the
𝑛(𝐴)
universe will ultimately balance out yielding only our denied proportion 𝑃(𝐴) = 𝑛(𝑆)
. Lastly, we also
recognize that the probability of an event measures the long term likelihood of the event occurring.
Thus, we come to a vastly different approach to determining the probability, that of using a simulation.
Effectively a simulation recreates the event we wish to measure and runs through the process many
times over to provide an estimation of the desired probability. If our simulation can in fact reproduce
the actions which comprise the event in question with a high degree of accuracy and enough trials are
run (which o en means thousands, if not hundreds of thousands) then our experimental result should
approach the theoretical result without the necessity of working through a cumbersome combinatorial
problem (which may be unsolvable given current mathematical techniques).
The idea of running through a simulation thousands of times may seem equally as daunting of a task,
but we are fortunate however to exist in an age where modern technologies allow us to work through
such tasks at exceptionally fast speeds. Many so ware applications allow us to simulate random
events; such technology includes basic computer programming, spreadsheets, and sometimes your
own pocket calculator. Much of the media attention that we see out of science magazines and journals
concerning the evolution of our environment, economic forecasting, etc... are all usually based on
computerized simulations. Of course, weʼll start with some baby steps to get your feet wet with the
discipline of simulating a probabilistic event.
Demonstrations of Simulations
The examples which follow are video demonstrations that exhibit some fundamental techniques for
designing simulations using spreadsheet so ware. In particular Google Sheets.
Example 2.6.1 (Rolling Dice)

The video below demonstrates how one can construct a simulation that estimates the probability of
rolling a sum of seven with two dice.
Solution:
Spreadsheet Simulations 1 - Introduction.avi
89
Our second example demonstrates a simulation of “The Cereal Box Problem” which was covered in
Practice 2.4 #6.
Example 2.6.2 (Cereal Box Problem)

A cereal company is providing a free game app giveaway to promote better sales. Each box contains
one of five game apps; Angry Birds, Bit.Trip Beat, Cut the Rope, DDR, or Enduro. Furthermore, each of
the five games are equally likely to appear in any particular box. If you were to purchase 12 cereal
boxes, what is the probability that you would obtain all five games?
Solution:
Spreadsheet Simulations 2 - A More Detailed Example.avi
⬛
Example 2.6.3 (Dependent Events)

Estimate the probability of selecting two hearts when drawing 3 cards from a standard deck.
Solution:
Spreadsheet Simulations 3 - Dependent Events.avi
⬛
90
Practice 2.6
Technique
The following problems are solved fairly easily in theory. Attempt to construct a simulation to model
the events and compare your results to the theoretical calculation.
1. Determine the probability of rolling a sum of 15 or greater with a set of 5 dice.
2. A spinner has four distinct colours on its wheel; blue, green, red, and yellow. It turns out that
blue is four times as likely as yellow; green three times as likely, and red twice as likely.
Determine the probability that the spinner will land on blue.
3. Determine the probability of getting a pair of Aces when being dealt five cards from a standard
deck.
Studies
The following problems can be solved theoretically, but are much more challenging to complete.
Estimate the probability by using a simulation.
4. (Honour Duels) Sadly, the two Victor's have arrived at an impasse and have chosen to duel!?!
Being a kind of unusual school, the rules for "dueling" are as follows:
i. Each combatant is to arrive at some time between 7am and 8am inclusive.
ii. Once arrived, they must wait for exactly 5 minutes. (Note: if arriving a 8am, Victor has to stay
until 8:05am)
iii. If they arrive within an overlapping time, then they must duel to the death. Otherwise, they
both leave, honour served.
Determine the probability that there will be a victor of the dueling Victors (i.e. they battle).
Note: the results will be better if you simulate to the nearest second.
5. (Baby Steps) Little Sharon is an infant who stands five steps away from her mother who is
situated in front of her; and one step away from her father who is behind her. Sharon moves
forward and backward in a random manner, whereby it is 2 times as likely for her to step
forward as opposed to backward. What is the probability that Sharon reaches her mother first?
6. (The Monty Hall Paradox) A game is played whereby a player is offered a choice of any one of
three doors to claim her final prize. Behind one of the doors is a car, while the other two hide
goats. Upon making her first selection the host reveals which of the unchosen doors hides a
goat. The contestant is then given the option to hold on to her original selection or switch to the
other door. Which door, if any, should the contestant choose?
91
Repertoire
The following scenario is yet unsolved. Attempt a simulation that could approximate a possible result.
7. (Dowry Dilemma) A King is perturbed by a local wise man and wishes to rid himself of this
“problematic fellow” once and for all. In trying to seem benevolent, he offers the man a choice
of 100 fair and equally desirable maidens. Each maiden has with her a dowry of a differing
value. If the wise man is able to select the maiden with the largest dowry he will be allowed to
wed her and keep all of the riches attached. If not he will be placed into the Kingʼs dungeons as
a fraud. What should the wise manʼs approach be to avoid a life sentence?
92
2.7 Distributions of a Random Variable
title_section
Key Definitions
Having now attained a fundamental understanding of probability, we now move to utilizing these
concepts to infer predictions based on categorized situations. As any good mathematician would do,
we will now start to hunt for some recurring patterns in probability theory, and hopefully begin to
identify common tactics to analyze such situations. We begin by introducing three important
definitions:
Definition 2.7.1
A random variable, X, is the set of all outcomes of a given event which is based on chance.
Weʼve defined this earlier in the unit, but weʼll now use this concept with greater emphasis. As such,
here are some examples of random variables;
1. Let H represent the number of heads tossed a er three tosses of a fair coin. Thus,
𝐻 = {0, 1, 2, 3}
2. Let C represent the colour of a traffic light when encountered at a certain intersection. Then
𝐶 = {𝑟𝑒𝑑, 𝑎𝑚𝑏𝑒𝑟, 𝑔𝑟𝑒𝑒𝑛}
Recall: Since the above random variables can only take on a “countable” set of outcomes, the random
variable is considered “discrete”. If we were measuring something like; time, distances, volume, etc...,
then we would consider the random variable to be “continuous” in nature, which will be examined in
the next unit.
Now for two new definitions that will be used extensively for the remainder of this unit.
Definition 2.7.2
The probability distribution of a random variable, X, is the set of all probabilities associated with each
outcome of the random variable.
Definition 2.7.3
The expected value of a random variable, denoted E(X), is the outcome of a random variable that is
most expected. Furthermore, E(X) is given by the equation;
𝐸(𝑋) = ∑[𝑥 · 𝑃(𝑋 = 𝑥)]

𝑥
Effectively, this just means to sum each outcome multiplied by the likelihood of that outcome.
93
Example 2.7.1
Determine the probability distribution and expected value of the random variable, H, which
measures the number of heads tossed a er three tosses of a fair coin.
Solution:
Examining the probabilities of each outcome we come up with the following table:
h P(H = h)
0 1
8
1 3
8
2 3
8
3 1
8
We o en show the distribution as a graph as is shown to the right above.
To get the expected value, we simply multiply each outcome from H with its associated probability, then
add up the total giving us;
𝐸(𝐻) = ∑ ℎ · 𝑃(𝐻 = ℎ)
ℎ
= (0) ( ) + (1)( ) + (2)( ) + (3)( )

1
8
3
8
3
8
1
8
= 1. 5
⬛
It should be noted that the expected value does not have to equal any of the outcomes of the
random variable. Instead, think of the expected value as a weighted average of the outcomes.
Example 2.7.2
A charitable lottery sells 200 000 tickets at a cost of $200 per ticket. The prize offerings are tabulated
below. How much would one expect to win by playing a single ticket of this lottery?
Prize Amount # of Winners
1 000 000 5
100 000 50
10 000 1000
94
1000 10 000
100 28 945
Solution:
Let A represent the prize amount won.
The table above effectively gives us the probability distribution for our random variable A. The
probabilities associated with each prize amount can simply be found by taking all of the winning tickets
divided by the 200 000 total tickets sold. Thus, our expected prize amount is given as follows;
𝐸(𝐴) = ∑ 𝑎 · 𝑃(𝐴 = 𝑎)
𝑎
= (1000000) ( 5
200000 ) + (100000)( 50
200000 ) + (10000)( 1000
200000 ) + (1000)( 10000
200000 ) + (100)
= 164. 47
Thus, one would expect to win $164.47 from a $200 ticket. Meaning we can expect to lose $35.53 on this
lottery.
⬛
95
Practice 2.7
Technique
1. A teacher records the time it takes for his students to go to the washroom a er asking
permission to leave during class. The results are shown in the table below and the times are
rounded to the nearest minute.
Time Frequency
1 10
2 12
3 18
4 25
5 15
6 10
7 6
8 1
9 1
10 2
a) Define the random variable, X, for this context.

b) Determine the probability 𝑃(𝑋 = 𝑥)for each outcome of this data.
c) Determine the expected time that it takes for a student to go to the washroom.
2. At a city bank, transaction times with the tellers were recorded during a one-hour period. Times
recorded in the table below are rounded to the nearest minute.
Time Frequency
1 18
2 14
3 10
4 6
5 2
6+ 0
a) Define the random variable T to measure the transaction time. Determine 𝑃(𝑇 = 𝑡)for each
outcome listed above.
b) Determine the expected transaction time for this branch.
96
3. A game consists of cutting a shuffled deck of cards. If you cut a face card (Jack, Queen, or King)
you win 10 cents. If you cut an Ace you win 25 cents. If you cut anything else you lose a nickel.
What is the expected win/loss for this game?
4. A game is played by rolling two dice. If the sum of the dice is either 2 or 12 then you win $2.00.
If the sum is 7, you win $1.00. The cost to play the game is 50 cents.
a) Is this game fair? (A fair game would mean that the player does not expect to win or lose.)
b) If you played this game 100 times, how much would you expect to win/lose?
5. A game consists of rolling a dodecahedral die (12 faces) and receiving a dollar amount
equivalent to the roll. What should the cost of the game be set at to ensure the minimum
amount of gain for the host?
Studies
6. The rules of a dice game are given as follows. If the roll, n, is a prime number (2, 3, or 5) then the
2
player wins $2n, whereas if the roll is a composite number (4, 6) the player loses $𝑛 , if a one is
rolled nothing happens. What is the expected outcome for a player of this game?
7. A kennel is to be enclosed with 20 m of fencing. The length of the kennel is to be an integer

length and is to be chosen randomly. What is the expected area of the kennel?
8. A rope 20 m long is cut into two segments randomly. The cut is made at one of the metre
markings. Each part is then used to form the perimeter of a square. Determine the expected
area of the smaller square.
Repertoire
9. Look up the rules for the Roll Up the Rim contest offered each year by Tim Hortonʼs. Determine
the expected value that one will win from this contest. Ensure to factor in costs of beverages
(including sizes) in this calculation.
97
2.8 Binomial Distributions
title_section
Common Distribution Types
For the next three sections weʼll examine very common types of distributions. Weʼll begin with a type of
distribution known as The Binomial Distribution, but letʼs use an example to motivate our ideas a little
bit.
Example 2.8.1
Suppose a pair of dice are rolled three times. Let D represent the random variable which counts the
number of doubles tossed. Determine the following:
a) The probability distribution of D.
b) The expected value of D.
Solution:
a) We first note that the outcome of each roll of the dice is independent of any previous outcome,
and that furthermore for each roll the probability of rolling a doubles is given by
1
𝑃(𝑑𝑜𝑢𝑏𝑙𝑒𝑠) = 6
.
Drawing out the probability tree to help us visualize matters we get the following;
Highlighting the case where two doubles are rolled we get that,
𝑃(𝐷 = 2) = 𝑃(𝐷𝐷𝐷') + 𝑃(𝐷𝐷'𝐷) + 𝑃(𝐷'𝐷𝐷)

1 2 5 1 2
= ( ) ( ) ( )( ) ( ) ( ) ( )
6 6
+
1
6
5
6
1
6
+
5
6 6
1 2 5
= 3 ·( )( ) 6 6
Thus, in computing the other outcomes of the random variable D in a similar manner we
tabulate our probability distribution as follows;
98
d P(D = d)
0 5 3
( )
6
1 5 2
3 · ( )( )
1
6 6
2 1 2 5
3 · ( )( )
6 6
3 1 3
( )
6
b) Our expected value can then be somewhat easily computed using by definition as follows;
𝐸(𝐷) = ∑ 𝑑 · 𝑃(𝐷 = 𝑑)
𝑑
= (0) (( ) )
5 3
6 (
+ (1) 3 · ( )( ) )
1
6
5 2
6 (
+ (2) 3 · ( ) ( ))
1 2 5
6 6
+ (3 ) (( ) )
1 3
6
= 0. 5
⬛
Patterning in the Binomial Distribution
While computing each instance of P (D = d) above, we notice that for any outcome the branches on our
tree diagram which yield the desired value of d would be identical. Thus, it becomes imperative to
simply determine how many of these branches yield this desired outcome?
To answer this letʼs revisit the above example where we calculated P (D = 2). In this case, we took the
varying cases whereby we could roll two doubles, and consequently roll one “non-double” as shown
again below;
𝑃(𝐷 = 2) = 𝑃(𝐷𝐷𝐷') + 𝑃(𝐷𝐷'𝐷) + 𝑃(𝐷'𝐷𝐷)
1 2 5
Recognizing that each of the branches yield the same probability, namely, ( ) ( ) , we then just have
6 6
to count the number of arrangements of the three “letters” making up DDDʼ. Of course, we should recall
this is one of our classic combinatorial problems, that of arrangements with identical elements, thus the
99
number of branches that yield two doubles and one non-double turns out to be;
3!
2!
OR 𝐶
3 2
ways.
giving us the equivalent expression for P (D = 2), namely;

1 2 5
𝑃(𝐷 = 2) = 𝐶
3 2 ( )( )
6 6
Letʼs now extend our problem a little further.
Example 2.8.2
A pair of dice are rolled 7 times. Let D represent the number of doubles that are rolled. Determine a
formula which evaluates P (D = d).
Solution:
To start off, we can see that drawing another tree diagram representing 7 rolls would be highly tedious,
thus weʼll avoid this task if at all possible. Letʼs instead observe the case where D= 2 once again.
Since we are rolling two doubles, this means that at some point we wonʼt roll doubles on five occasions.
Hence, any branch on this massive tree would yield a likelihood of;
1 2 5 5
( )( )
6 6
and so weʼre again le with the task of determining how many branches yield this probability. But this
too is tackled in a similar manner as we must simply consider how many arrangements of DDDʼDʼDʼDʼDʼ
are possible. Thus we get;
We can combinatorially think of the last expression as us selecting the two rolls where doubles are
attained from the seven total, leaving the remaining five as unsuccessful attempts.
Generalizing this calculation we get that;
100
Generalizing Binomial Distributions
Letʼs not stop generalizing here. If we have any situation whereby we have:
● repeated independent events with the chance of success equalling p and
● the random variable X counts the number of successes resulting from these trials we get that;
where, n represents the number of trials performed, p is the probability of success, and 𝑞 = (1 − 𝑝) is
the probability of failure. Such a random variable is classified to exhibit a binomial distribution.
It turns out that the expected value of a binomially distributed random variable is given by;
𝐸(𝑋) = 𝑛𝑝
which is a shockingly simple expression!! If we think about it, this should not be so surprising. Given
that any event has the same chance of success p and we perform n such events, then it would be
reasonable to guess that we would on average obtain np successes. Of course this is not a rigid proof of
this formula and it is somewhat tricky but it is within the scope of this course and so you are asked to
tackle it in the repertoire portion of the practice.
Example 2.8.3
An automated process designed to produce ball point pens has an average defect rate of 1%. Quality
control selects a sample of 50 pens from a batch of 10 000 to determine if the batch is spoiled. If there
are more than two defective pens then the batch is sold as damaged at a greatly reduced price (i.e. this
is where your dollar store pens come from).
a) What is the probability that a batch of pens will be deemed defective?
b) What is the expected number of defective pens that would be found from a batch of 10 000
pens?
Solution:
a) (Indirect Approach)
Let X represent the number of defective pens found in the sample. Clearly, X is binomial as we
have a fixed number of trials, 50, with each pen being examined having a 1% chance of being
defective. Thus we get;
𝑃(𝑋 > 2) = 1 − 𝑃(𝑋 ≤ 2)
= 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)]
≃ 0. 0138
101
Thus, there is about a 1.4% chance of the batch being declared defective.
b) Let Y represent the number of defective pens in the entire batch. Again, Y is also binomial as we
have a fixed number of trials, 10 000, with each pen having a 1% chance of being defective. This
gives us;
𝐸(𝑌) = (10 000)(0. 01) = 100
Thus, on average, each batch will yield 100 defective pens.

⬛
Practice 2.8
Technique
1. A multiple choice test contains ten questions with each question having five possible responses,
what is the probability that:
a) Jeremie will score at least seven correct on the test if he merely guesses at each response?
b) Chloe will score at least seven correct if she feels that the chances of getting any problem
correct is 0.75?
2. In a manufacturing process, it is estimated that only 2 percent of the bolts produced are
declared defective. In a package of 50 bolts, what is the probability that there is at least one
defective bolt?
3. 85% of students in Ontario participating in the grade 9 EQAO academic mathematics test have
successfully passed in the past five years. If a school had 30 students participating in the EQAO
test this year, determine
a) the probability that this schoolʼs success rate would exceed 90%?
b) the expected number of students who will pass the EQAO.
Studies
4. (Multinomial Distributions)
a) A roulette wheel has 38 spaces whereby 18 are red, 18 are black, and 2 are green. A er
20 spins determine the probability that 12 resulted in red, 5 black, and 3 green?
b) Generalize this situation by having X count the number of each type of outcome in n
independent trials with each trial having one of three possible outcomes. Determine a
formula for 𝑃(𝑋 = 𝑎, 𝑏, 𝑐).
c) Generalize further by accounting for n independent trials with r different types of
outcomes.
5. For each positive integer n, what is the largest value of p such that zero is the most likely
number of successes in n independent trials with success probability p?
6. A department store promotion involves scratching four boxes on a card to reveal randomly
printed letters from A to F. The discount is 10% for each A revealed, 5% for each B revealed, and
1% for the other four letters. What is the expected discount for this promotion?
102
7. An airline has determined that 5% of its customers do not show up for their flights. If a
passenger is bumped off a flight because of overbooking, the airline pays the customer $200.
What is the expected payout by the airline, if it overbooks a 240 seat airplane by 5 seats (i.e. they
sell 245 tickets)?
Repertoire
8. (Bell Shaped Distribution) By simulating on differing values of n and p, determine conditions
whereby the graph of a binomially distributed random variable will yield a “bell shaped”
distribution.
9. Prove that if X is binomially distributed, then E(X) = np.
103
2.9 Geometric Distributions
title_section
Waiting for a Success
We now turn to a slightly different type of problem, that of “waiting” for a certain outcome to occur. As
with the previous section letʼs observe an example to motivate this type of distribution.
Example 2.9.1
Suppose that two dice are repeatedly rolled until doubles are attained. Let R represent the number of
rolls before the first occurrence of doubles. Determine the probability distribution for the random
variable R.
Solution:
As weʼre simply waiting for the first occurrence of doubles, we realize that the random variable R has an
infinite set of outcomes associated with it, namely;
𝑅 = {0, 1, 2, 3, 4, ...}
For example, if R = 0, that would mean we rolled doubles on the first roll (i.e. no rolls occurred before
obtaining doubles).
Observing the outcome R = 2, we know that the first two rolls failed to be doubles and then the third
roll was a successful attempt at obtaining doubles. As the rolls are successive and independent events
we get that,
5 2 1
𝑃(𝑅 = 2) = ( )( )( ) ( ) ( )
5
6
5
6
1
6
= 6 6
Using similar reasoning, we get that for the case where R = 5 we get;
5 5 1
𝑃(𝑅 = 5) = ( )( )
6 6
and so generalizing itʼs easy to see that for any outcome of the random variable R we get that,
5 𝑟 1
𝑃(𝑅 = 𝑟) = ( )( )
6 6
⬛
At this stage you may have noticed some similarities between the above example and that of Example
2.9.1 where the random variable D was counting the number of doubles rolled a er three rolls of two
die. Specifically, both types of random variables considered repeated independent events. These type
of events are commonly referred to as identical and independently distributed events or “i.i.d.” for
short. Moreover, i.i.d. events have only two possible outcomes they are referred to as Bernoulli trials.
The fundamental difference however, is that when a random variable is binomial we are counting the
number of successes from a fixed number of Bernoulli trials while with our random variable R we are
counting how many trials (rolls) until the first success.
104
The General Probability for a Geometric Distribution
In general, we define a random variable X which counts the number of trials before the first attained
success from a series of Bernoulli trials geometrically distributed. The general probability of x trials is
given by;
𝑥
𝑃(𝑋 = 𝑥) = 𝑞 𝑝,
with p representing the probability of a success and q = 1 - p represents the likelihood of a failure.
It turns out that the expected value of a geometrically distributed random variable is given by the
proportion;
𝑞
𝐸(𝑋) = 𝑝
The proof of this is le as an exercise to the reader in the practice problems.
105
Practice 2.9
Technique
1. If you repeatedly cut a deck of cards, what is the probability that you cut an ace in fewer than
five cuts?
2. What is the expected number of people you would have to sample to find someone with the
same birthday as yours? (Assume ʻleap yearsʼ are not taken into account.)
3. It is estimated that on a daily basis 15% of the customers at a supermarket use the express
checkout. What is the probability that one of the first five customers on a given day uses the
express checkout?
Studies
4. (Alternative Definition for Geometric Random Variables) An alternate way of considering a
geometric random variable is to let X count the number of total trials required to gain the first
success.
a) Derive a formula for P(X = x) for this alternate definition.
b) Derive the formula for E(X) given this version.
5. Suppose you repeatedly roll two dice until doubles are achieved for the second time. Let R
count the number of rolls before this is achieved. Derive a formula for P(R = r).
6. Let X count the number of trials before the nth success in a process of identical independent
trials (iid trials). Derive a formula for P(X = x).
7. Determine the expected number of rolls it would take to see all six faces of a single die. (Hint:
Use the result from Studies problem 1b in this section repeatedly.)
Repertoire
𝑞
8. If a random variable X is geometrically distributed, prove that 𝐸(𝑋) = .
𝑝
106
2.10 Hyper-Geometric Distributions
title_section
You may have noticed that both Binomial and Geometric distributions are based on a series of identical
and independent events (iidʼs). Of course, probability distributions do not have to be confined to such
limitations. Thus we now turn to a distribution that is based upon dependent events. Letʼs once again
begin with an example.
Example 2.10.1
Suppose that three cards are drawn from a standard deck. Let H represent the number of hearts
selected.
a) Determine the probability distribution for the random variable H.
b) Determine the expected number of hearts that would be selected.
Weʼll take two distinct approaches to solving this task. Both will hopefully be somewhat intuitive,
however, the second method should prove to be a more efficient approach.
Solution: (considering order of selections)

a) If you couldnʼt tell earlier, the problem has only three cards selected because this will make for a
much smaller tree diagram, as they are surely becoming somewhat of a tedious yet effective
description for successively random events. Anyway, in considering the order in which we can
obtain cards that are hearts or not, we have a highly similar situation to that of the Binomial
distribution as you will perhaps notice in the diagram below.
Denoting H to represent a selection of a heart, and Hʼ meaning that a heart was not selected.
This yields the random variable H counting the number of hearts obtained giving us;
107
𝑃(𝐻 = 2) = 𝑃(𝐻𝐻𝐻') + 𝑃(𝐻𝐻'𝐻) + 𝑃(𝐻'𝐻𝐻)
= ( )( )( ) + ( )( )( ) + ( )( )( )
13
52
12
51
39
50
13
52
39
51
12
50
39
52
13
51
12
50
= ⎡( )( )( )⎤
3! 13 12 39
⎣ ⎦
2! 52 51 50
The last rearrangement is not an overly obvious “simplification”. Rather, letʼs recognize that this
expression leads us to a generalization to solve this type of process as follows;
● since each branch that contained two hearts shared the same probability we then can count the
number of branches which yield the selection of two hearts and one “non-heart”. This simply
becomes the combinatorial problem of arranging the symbols HHHʼ, or put more concisely we
are selecting the position where the two hearts were selected with the remaining being forced
to be a “non-heart”. Thus we get the expression 3𝐶2
● each branch which yielded two hearts can be thought of as selecting or arranging the two hearts
selected from the 13 total while we must also select one of the 39 other cards, while in total we
are drawing three cards from the 52 overall.
So now we can generate our probability distribution for the random variable H as tabulated
below:
H P(H = h)
0
b) Reverting back to fundamentals, we determine the expected value to be;
𝐸(𝐻) = ∑ ℎ · 𝑃(𝐻 = ℎ)
ℎ
108
= (0)(0. 414) + (1)(0. 436) + (2)(0. 138) + (3)(0. 012) = 0. 75
⬛
Solution 2: (discounting order of selection)
a) Weʼll again refer to the number of hearts selected as H, but in this case weʼll only recognize that
in general we are selecting a total of three cards from the 52 total. If we discount the order of
selection the total number of ways this can occur is given by;
Note: We are able to discount order or consider it as we have a series of dependent events.
Moreover, since we are only concerned with the objects selected (and not the order in which
theyʼre taken) we have that the number of ways of getting h hearts is given by,
Thus, we get that the probability of selecting h hearts as;
So, we now can determine the probability distribution:
H P(H = h)
0
109
b) Since the distribution is equivalent, weʼll get that 𝐸(𝐻) = 0. 75
⬛
Generalizing Hyper-Geometric Distributions
The previous example suggests two valid methods for generalizing the probability of a particular
outcome of a hyper-geometric random variable.
Hyper-Geometric (Version 1: Considering Order)

Given a collection of;
● n total objects,
● g of which are considered desirable or “good”,
● b of them are considered non-desirable or “bad”.
Selecting r of these objects without replacement and we letting the random variable X count the
number of “good” objects selected we get that;
Hyper-Geometric (Version 2: Discounting Order)

Under the same circumstances as above, if we discount order the probability of a certain outcome of X
yields;
Despite this formula being somewhat easier to comprehend it is of greater importance to simply
understand it rather than memorize! Rewriting this in “words” as follows;
(# 𝑤𝑎𝑦𝑠 𝑡𝑜 𝑠𝑒𝑙𝑒𝑐𝑡 "𝑔𝑜𝑜𝑑 " 𝑜𝑏𝑗𝑒𝑐𝑡𝑠) (# 𝑜𝑓 𝑤𝑎𝑦𝑠 𝑡𝑜 𝑠𝑒𝑙𝑒𝑐𝑡 "𝑏𝑎𝑑" 𝑜𝑏𝑗𝑒𝑐𝑡𝑠)

𝑃(𝑋 = 𝑥) = (# 𝑜𝑓 𝑤𝑎𝑦𝑠 𝑡𝑜 𝑠𝑒𝑙𝑒𝑐𝑡 𝑡ℎ𝑒 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑜𝑣𝑒𝑟𝑎𝑙𝑙)
Expected Value of a Hyper-Geometric Distribution
The name hyper-geometric, as with the distributions discussed earlier, is derived from the calculation
to determine the expected value of this type of random variable which turns out to be;
110
𝐸( 𝑋 ) = 𝑟 · ( )𝑔
𝑛
where r, g, n represent the number of selections, “good” objects, and total objects respectively.
One might notice that this is similar to the formula for expectation of a binomial random variable given
by np, and in fact they are highly similar. The hyper-geometric expectation when written in words can
be thought of as;
𝐸(𝑋) = (# 𝑡𝑟𝑖𝑎𝑙𝑠)(𝑖𝑛𝑖𝑡𝑖𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠)

with the only distinction from the binomial case being that we must consider the initial chance of
success only as the events in this situation are dependent in nature.
In the problems that ensue, you will also consider situations where there are more than two classes of
objects to consider (e.g. desirable versus undesirable objects). In such circumstances, try using the
second method discussed in this section in order to determine the probability of a certain number of
each type of object selected.
Practice 2.10
Technique
1. Of 20 apples in a basket eight are rotten. If you select 5 apples from this basket determine:
a) the probability that one of them is rotten.
b) at least one of them is rotten.
c) the expected number of rotten apples selected.
2. At an office party, names were drawn from a hat to pick teams for a game of charades. There
were 10 females and six males at the party. What is the probability that the first team of four
selected was all female?
3. The quality control inspector at a Popsicle Factory estimates that 2% of the Popsicles shipped
by the factory are broken. In a box containing 20 popsicles determine the probability that there
are two broken popsicles. Also, determine the expected number of broken popsicles.
4. In the card game bridge, each player is dealt 13 cards. The number of cards in each suit is
important in determining the type of hand that should be played. Determine the probability
that a player will be dealt 4 diamonds, 3 clubs, 3 hearts, and 3 spades.
111
Studies
5. A lot of 50 items are inspected by the following two-stage plan.
I. A first sample of 5 items is drawn without replacement. If all of the objects are good the
lot is passed; if two or more are bad the lot is rejected.
II. If the first sample contains just one bad item, a second sample of 10 more items is
drawn without replacement (from the remaining 45 items) and the lot is rejected if two
or more of these are bad. Otherwise the lot is accepted.
Suppose there are 10 bad items in the lot. What is the probability that the lot is accepted?
6. (Chase the Ace) A game is played whereby you are continually dealt cards until an Ace is
drawn. For each card that is selected that is not an Ace the player receives $2. The cost to play
the game is $20.
a) Determine the probability that the player will win $10 (thus losing $10 overall) when
playing this game.
b) Determine the probability that the player will win $2n where n is the number of
“non-Ace” cards selected.
c) Determine the long-term expected winnings for this game. Is this game fair?
7. Suppose a population of N elements consists of G “good”, B “bad”, and I “indifferent” elements,

with B + G + I = N. If a random sample of size n is drawn without replacement from the
population. Determine the probability that there are g good elements, b bad elements, and i
indifferent elements. Generalize this formula to where there are m classes of elements (i.e.
𝑃(𝑋 = [𝑏, 𝑔, 𝑖]).)
Repertoire
8. a) Prove the identity
b) Use this identity to prove the formula for the expectation of a hyper-geometric random
variable.
112
2.11 Hypothesis Testing: An Introduction
title_section
Control Studies
A common type of experimental design is that of a control study. In such a study two testing groups
are set up in the following manner;
Control Group
The subjects in this group are effectively le “untouched” and are run through the experiment
to determine a base measurement.
Experimental Group
The subjects in this group have a differing attribute (typically only one thing is changed or
tampered with) and are run through the same experiment to determine if this attribute has any
significant effects as compared to the control subjects.
The question then becomes, at what level is the experimental result actually demonstrating a
significant enough result to state that the change yields a distinctive result from the norm?
The answer to this question is derived from a process known as hypothesis testing which is reliant on
probability theory to estimate relative likelihood of the experimental result. In general, if the result has
a reasonably low probability under normal circumstances then we can conclude that the changed
attribute has a significant impact. As usual weʼll let an example guide us through the technique at
hand.
Example 2.11.1
Matt the Miraculous is believed by many to be a mind reader. To test this claim he is subjected to an
experiment with the following procedure:
1. Matt is to sit on one side of a table, while the experimenter sits opposite to him.
2. A barrier is placed between the two people so that Matt cannot see nor hear the experimenter.
3. The experimenter then draws one of five cards each with a distinct symbol on it; a star, wavy
lines, triangle, circle, or square.
4. The experimenter then focuses their entire attention on the symbol selected.
5. When ready, Matt then attempts to guess which of the symbols has been selected.
6. The card is replaced and the deck is shuffled.
7. Steps 3 - 6 are repeated nine times for a total of ten trials.
Matt was able to successfully guess 7 of the 10 symbols. Do the results of this experiment suggest that
Matt may in fact be a mind-reader?
Solution:
Starting off, you may recognize that when all is “normal” each guess made has an equal likelihood of
being correct, thus the number of correct guesses C would follow the binomial distribution.
For this solution, weʼll go through the process of a hypothesis-test, then summarize the strategy
113
a erwards.
I. State the Null Hypothesis H0:
Weʼll begin by stating what these “normal” conditions are, this is called the null hypothesis and
is usually denoted by H0. In this case, since the number of correct guesses, C, is binomial, the
key feature weʼll use to compare with is the likelihood of success p. Thus we have that,
1
𝐻0: 𝑝 = 5
II. State the Alternative Hypothesis Ha:

Secondly, our own experiment suggests that Matt may in fact have a higher chance of success
on each turn (possibly due to some extrasensory ability). This gives rise to our alternate
hypothesis, denoted by Ha, given as;
1
𝐻𝑎: 𝑝 > 5
Note: We donʼt use the actual experimental result, in this case p = 0.7, rather weʼre just
acknowledging that the result seemed to improve upon “normal” circumstances.
III. Determine the Significance Level (𝞪)

Next we have to decide on a significance level, denoted by the greek letter 𝞪 . This level acts as
a decision marker in the following manner;
- if the probability of our experiment is less than 𝞪, then weʼll conclude that the result
was significant enough to reject the null hypothesis and accept the alternative
hypothesis.
- if the probability of our experiment is greater than 𝞪, then weʼll conclude that the
result is plausible under the conditions of the null hypothesis, and so the null
hypothesis stands.
For this experiment weʼll set 𝞪 = 0.01 (or 1%), meaning that we wonʼt count Mattʼs result as
significant if the probability of getting 7 or more correct guesses out of 10 is higher than this
level.
IV. Test the Likelihood of the Experimental Result (Obtaining the “p-value”)
First off , weʼll assume that the “null hypothesis” is still valid, thus in this case we are assuming
that p = 0.2.
Now we test the likelihood of the experimental result, called the p-value:
𝑃(𝐶 ≥ 7) = 𝑃(𝐶 = 7) + 𝑃(𝐶 = 8) + 𝑃(𝐶 = 9) + 𝑃(𝐶 = 10)
7 3 8 2 9 10
= 𝐶 (0. 2) (0. 8) +
10 7
𝐶 (0. 2) (0. 8) +
10 8
𝐶 (0. 2) (0. 8) + (0. 2)
10 9
= 0. 000864
114
Note: You may have noticed that we didnʼt just take the specific result of getting 7 guesses
correct, but instead observed 7 and above. Youʼll be asked to justify why this was done in the
practice...it wasnʼt in error!
V. Conclude
Since the likelihood of Mattʼs result is 0.09% which is far less than 𝞪, we can conclude that this
result was, in fact, significant!
⬛
Now, read through this example again paying specific attention to the details and reasons why the
process is structured the way it is.
Now, read it again! Iʼm serious, this is a fairly non-intuitive process…
A Useful Metaphor for Understanding Hypothesis Testing
A good metaphor to help comprehend the logic behind a hypothesis test is that of a criminal trial. The
table below will exhibit the parallels:
Hypothesis Test Jury Trial
Null Hypothesis (H0) The accused is presumed innocent.
Alternate Hypothesis (Ha) The prosecutor suspects the accused is guilty!
Significance Level (𝞪) “Reasonable Doubt”.
Testing the Likelihood of the Experiment The prosecutor will exhibit evidence that demonstrates why
(p-value) the accused is guilty.
Conclusion If the evidence establishes that the accused committed the

crime beyond any reasonable doubt, then they will be found
guilty.
And now, one more example to run through the process once again.
Example 2.11.2
Each year the grade 10 secondary students in the province of Ontario participate in the Ontario
Secondary School Literacy Test (OSSLT). This test measures the provincial, school board, as well as
individual schoolʼs literacy rates. Suppose our school has had an average pass rate of 88% over the past
three years. This year however, our school had 172 of 200 students pass the test; an 86% success rate.
The principal is alarmed at this drop in success. Determine whether or not this result is in fact
significant to a significance level of 5%.
Solution:
I. Null Hypothesis: 𝐻0: 𝑝 = 0. 88
II. Alternate Hypothesis: 𝐻𝑎: 𝑝 < 0. 88 Since our result indicates a lower performance.
115
III. Significance Level: α = 0. 05
IV. p-Value: 𝑃(𝑆 ≤ 172)

= 𝑃(𝑆 = 172) + 𝑃(𝑆 = 171) +... + 𝑃(𝑆 = 1) + 𝑃(𝑆 = 0)
172 28 200
=200𝐶172(0. 88) (0. 12) +... + (0. 12)
= 0. 2196
V. Conclusion:
Thus, given the historical pass rate on this test there is nearly a 22% likelihood that this years
class would achieve at this level. Therefore, the result is not significant based on the historical
pass rates. Though it should be mentioned that a school community should always strive for
improved results especially when it comes to literacy rates.
(Note: The above calculation was performed on spreadsheet so ware which is highly capable of
performing these types of repeated calculations with great speed and accuracy.)
⬛
116
Practice 2.11
Technique
1. Why is it important to consider a range of outcomes when testing the probability of an
experimental result in a hypothesis test?
2. Rollin Richard believes he has the power to control his dice rolls. To prove this claim he rolls a
set of three dice ten times and achieves triples on 3 occasions. Does this exhibition suggest that
Richard actually has the ability to roll with a greater precision than a normal person? Test the
result to a significance level of 1%.
3. Dwight Howard is a famous basketball star known to be a dominant scorer. He does have one
glaring weakness when it comes to free throws. His career success rate is 56.78 %. This past
off-season, Dwight elicited the services of a shooting coach, Dr. Yu, to help him with this
deficiency in his game. A er working with Dr. Yu his first game back saw Dwight succeed on 12
of 15 free throw attempts. Test this result to a significance level of 5%.
Studies
It is recommended that spreadsheet so ware be used for the following tests.
4. A major car manufacturer is interested in determining whether the company has made a
significant improvement in its market share a er a year long advertising campaign. The
company has enjoyed a 24% share of the market historically. In a random sample of 500 vehicle
registrations, 152 belonged to the manufacturer. Test this result to a significance level of 5%.
5. In a recent political poll of 1200 people, 252 indicated that they would support the Green Party
in the next federal election. In 2011 the Green party was able to garner 18% of the popular vote.
Does this suggest a significant improvement in the parties support? Test to a significance level
of 1%.
117
2.12 Designing an Arcade Game of Chance
title_section
The Premise
You (including up to two others) are tasked to develop a game which satisfies the following objectives:
1. The objective of the game is to earn points.

2. Each move that is made in the game must have a chance element to it.
3. The game should not be passive, that is, decision making should occur on each move.
4. The game should be easy to learn and have the ability to be completed within about a 10
minute span.
Analysis of the Game
Your team will be asked a probability based question per team member. Since the game is points
based, one of the questions will require you to determine the expectation of the number of points
earned based on a certain situation. The questions will be tailored to your game.
You may solve the questions either by theoretical or simulated means.
118
Unit 3: Statistical
Reasoning
main_section
ere are three kinds of lies: lies, damned lies, and statistics.
Mark Twain
Surfing the “Bell Curve”

A ceiling tile at the UofT Physics Lounge
119
120
3.1 Statistically Designed Experiments
title_section
When one refers to the word ʻstatisticsʼ or more simply stats we naturally just think of numbers, for
example, an individual's physical attributes such as height, weight, etc…, the famous 2.2 children per
family in North America, or the number of goals/points a professional athlete collects throughout his or
her career. Increasingly, we are becoming a culture obsessed with such quantifications and
classifications to the point where a concept or idealism that is not backed up by “hard evidence" or
“data driven dialogue" (which is something of a slang phrase for summary statistics) will o en not be
taken very seriously. The unfortunate reality of this prevailing culture is that, in general, there is
somewhat of a lack of understanding of the process of statistics nor an understanding of what the
quoted results actually imply. To this end, we o en have highly skewed reporting of statistical
information which leads to poor decision making which very o en affects many of us at some level or
another. Consequently, it has been proposed by some that statistical literacy be emphasized in our
education system to a greater degree than is currently practiced.
In viewing statistics as a process as opposed to a reporting of numerical information we are referring,

more or less, to an elaboration of the scientific method. This process can be summarized by the
following stages:
STEP 1: The Purpose

Clearly there must be some type of purpose in attempting to design an experiment in the first place.
This purpose is usually posed in the form of a question of some type and it will be your experiment that
attempts to answer this inquiry by collecting and analyzing data that is relevant to the issues
surrounding this question. Moreover, you may wish to state your own presumption as to what the
answer will be to this question, this initial guess is, of course, referred to as the hypothesis.
STEP 2: Declaration of Variables

We now must decide, “how we are going to measure the issues in question." Initially, it is good practice
to consider any variable that may have an impact on the question at hand. As you progress and
eventually design a procedure (see below) youʼll probably have to prune some or many of these
variables of concern in order to avoid bias in your conclusions. At this stage it is also critically important
to understand and determine how these variables will be measured. Some may be as easy as a “yes/no"
response/result, whereas others will have to be carefully decided upon (sometimes even requiring
separate experiments to weed out a reliable measurement process).
Variable Types:
Qualitative:
These types of variables focus on some type of quality. Examples are colour, socio-economic status,
location, gender, etc. Many studies will compare across two or more qualitative groups in order to
determine if there is a significance across such groupings.
Quantitative:
These are variables that are based on some type of quantity. We can think of these as “measured” or
“numerical” type variables. Examples of quantitative variables are: mass, distance, time, etc.
121
Indices:
At times, we may have to devise our own type of measure that takes the form of a function. Such
variables are called indices and o en act to factor in many variables in pre-defined proportions to
estimate some type of . Some common examples of indices include:
- BMI (Body Mass Index) which is used to gauge oneʼs obesity level. Itʼs defined by the function;
𝑚
𝐵𝑀𝐼 = 2 ; where h is the height in metres and m is oneʼs mass in kilograms
ℎ
This leads to the following classifications of oneʼs weight as follows;
BMI Classification
< 18.5 Underweight
18.5 - 24.9 Healthy Range
> 25 Overweight
- CPI (Consumer Price Index) This index is used to determine an estimate of the changes of the
prices of common consumer goods. It is usually calculated by summing over a “basket of
goods” that include things like food, shelter, household operations, clothing, transportation,
and so on…
- WAR (Wins Above Replacement; Baseball) This measures factors in a baseball player's
effectiveness of play factoring in their offensive and defensive metrics. It attempts to estimate
how many “wins a team will earn above a common replacement player from a lower league”.
So if a player earned a WAR of 7, this means that the team would win 7 more games than if the
player was replaced by a player from a lower level in the organization.
The last two examples are quite complex multivariable functions which means that they incorporate
and combine many attributes. Devising indices is somewhat of an art. If done well, they can be highly
useful in compressing a complex system into an easily read metric; this is o en where mathematical
creativity comes into play and also where one can be rewarded handsomely if such a metric yields
highly predictive analysis.
STEP 3: The Procedure

This is the stage whereby you will collect raw data to be analyzed later in your study. In this
stage, it is critical to fully understand how each variable of interest is going to be measured.
Effective procedures usually have the following characteristics:
1. A repeatable process.
Ideally every observation should be obtained in as close to an identical manner as possible to

eliminate unwarranted effects on your results.
122
2. Measurements are clearly and objectively defined.
Any measurement you make that must be scored on your own judgement may be subject to
preconceived bias that you may not even be aware of.
3. Avoid overly complex methods.
If at all possible, try to streamline the procedure so that it's easy to set up and run. Very o en
this will mean that you may have to hold some of the variables youʼve considered to be
constant. In doing this we will probably satisfy the first condition fairly easily and more
effectively address the purpose of the study.
STEP 4: Analysis
In the upcoming sections of this book weʼll look into various methods of analysis by which
we can examine the data that weʼve collected. If you continue with future coursework in
statistics youʼll be immersed into even more sophisticated (and hopefully effective) methods
to analyze your raw data.
STEP 5: Conclusions
Once weʼve analyzed the data collected, it is our job to translate the mathematical results in
context. The methods of analysis we have at our disposal are based on mathematical theory
and consequently are valid in an abstract sense only. It is then incumbent to the
experimenter to understand the consequences that these abstract results have in the
situation being observed. Whatʼs more, we must always be aware to rely only on the
analytical results when determining these conclusions. If we add any conclusions beyond
those readily available from the analysis we once again may encounter bias in our
experiment. Of course, once conclusions are derived we may conjecture as to the meaning or
implications of the results. Such conjectures, however, are simply the process of statistics
coming full circle and so to address these hypotheses adequately a new experiment would
be born.
STEP 6: Presentation of Findings

Once our experiment is complete, we o en must report or present our findings to some type
of audience. It is understandable that one would take this phase of an experiment lightly, but
it should be warned that if the presentation of the findings are poorly designed the results
may not be considered or even noticed! As such, we should be cognisant of the audience to
whom weʼre presenting to. When designing your presentation consider the following:
1. The level of statistical/mathematical expertise of the audience.
2. Will the audience be questioning your procedure?
3. What information is your audience interested in: the procedure, analysis,

conclusions, possible consequences, or all of the aforementioned issues?
As far as this course goes, you can assume that your audience has the same background as you do (i.e.
123
up to the level of this course), and we will be interested in all phases of the experiment.
Example 3.1.1 A study was conducted that wished to examine the prevalence and extensiveness of
exterior Christmas house decorations as it related to the property value of the dwelling. Determine
variables of consideration and a procedure to collect this information. (Caroline, Fall 2010)
Proposed Solution:
A. Potential Variables of Concern
- Variety of Decoration Types (V) (e.g. Lights on Gutter, Window Images, Inflatable
Characters, Wreaths, etc…): This variable would simply count the number of distinctive
types of decoration that is present. For example, if a household displayed 5 wreaths,
decorated the outside tree with lights, put a large inflatable “Frosty the Snowman” on
the front yard, and outlined the doors, garage, and gutters with lights, then 𝑉 = 6,
where “lights” on differing types of objects will count as distinct, but the same type of
object (e.g. wreath) will only count once.
- Proportion of Visible Decoration Coverage (C) : Each feature of the property (each
door, window, trees, porch, garage door, yard, etc…) will be observed as to whether or
not a decoration is placed on it or not. For example, if the property had 15 features and
1
5 of these features had decorations on it, then 𝐶 = 3
.
B. Proposed Decoration Index (DI)

Weʼll define DI by simply taking the product of the two factor stated above, thus
𝐷𝐼 = 𝑉 · 𝐶
For example, consider the image below of a decorated house:
𝑉 = 10; roof lighting, wreath, stars in windows, star on tree, lighting on tree/bushes,
garland on railings, inflatable snowman, “presents” on lawn, statues on porch, jingle
bells on the door.
11
𝐶 = 13
; where the features considered are: roofing trim, 4 windows, front door,
decorative window above the door, railings, 2 bushes, front yard, porch, and tree.
124
Thus, 𝐷𝐼 = 10 ·
11
=
110 ˙ 8. 5
=
13 13
A higher DI score would indicate more extensive and/or elaborate decorations.
C. Procedure for Collecting Individual Data

1. Count number of property features,
2. Count number of distinctive decorations,
3. Count number of “undecorated” property features.
4. Record the address of the property.
Notice that our procedure doesnʼt have us determining DI on the spot, rather, when we are in the field
collecting information, we simply have to write in the attributes that are necessary to complete the
calculations. We can then enter this information into a spreadsheet and complete our analysis later.
The address will allow us to look up the property value on real-estate pages later in order to examine
the main topic question.
Also note that the defined index may have some unforeseen issues. It is wise to complete some
“dry-runs” of a procedure just to get a sense of the practicality of the experiment and whether or not
these variables are sufficient. The better we “clean up” the design, the better the results of our analysis
will be.
⬛
125
3.1 Practice
Studies
For the following case studies,
A. Declare variables that will address the issue.
B. Combine these into a proposed index measure that allows us to grapple with the issue with a
single variable.
C. Determine a practical and easily repeatable procedure to collect the defined variables.
1. (Stall Parking) Stall parking is a parking situation found in most parking lots where each “space”
is referred to as a “stall”. This study wishes to examine how well a driver executes a “stall park”
as compared to differing qualitative variables such as gender, age, experience, etc… Focus here
on just how we can measure the quality of the parking job. (Karthika, Spring 2009)
2. (Two-Bite Brownies) How many bites does it truly take to eat a “two-bite brownie”? (Sarah,
Spring 2013)
3. (Social Distance Awareness) How aware is one of their surroundings in differing social contexts?
This study should examine “awareness” under comparative qualitative variables such as
distraction factors (looking at a cell phone or not; listening to music; reading, etc..) Focus on
how to measure “awareness”. (Mustafa, Spring 2014)
4. (Student Engagement) How engaged are students in a class activity. The study here wishes to
examine the effectiveness of a lesson activity to improve upon teaching practice. Focus on how
one could measure student engagement. (Joel, Spring 2009)
126
Repertoire
5. (Creating Surveys) Experiments usually take on two types of modes: Observational or Survey
Based. In an observational study, the respondents (be it people or objects) are observed and
measured according to the defined procedure. For survey based experiments, much of the
procedure (or all of it) will rely on respondents answering predefined questions found on the
survey. As such, designing surveys can be an art all of its own. Itʼs very easy to design bad
surveys which will yield inconclusive data points. The link below addresses some
considerations in building effective survey questions...enjoy!
Elon University Poll: 7 Tips for Good Survey Questions
127
3.2 Sampling Methods
title_section
A major component to designing a procedure for a statistically based experiment is to decide upon a
strategy for collecting your samples. The critical motive here is to try and ensure that the results
collected will yield authentic trends that reflect the true nature of your population. Consequently, our
decisions for selecting respondents should:
1. remain objective as possible, and
2. ensure that the sample is large enough to avoid over or under estimating the results that will be
analyzed.
If we achieve this objective we will have obtained what is referred to as a representative sample.
Suppose that we are to sample from the population of people depicted below.
We can devise our sample strategy using the following methods:
Convenience Sampling
This strategy is aptly named as it simply refers to going out into the population and simply selecting
respondents that are conveniently found. Of course, this strategy jeopardizes our likelihood of obtaining
a representative sample since there may be some type of group/characteristic that is overrepresented.
128
Whenever one relies on convenience to obtain their results, the danger of exposing unintentional (or
even possibly intentional) biasʼ is more likely. Overall, convenience sampling is generally employed
when the conclusions derived from the experiment are either of little consequence, limited available
resources, or perhaps are just exploratory in the sense that one is determining whether or not to
commission a larger study.
Random Sampling
Moving away from convenience sampling, one would like preferably to select their respondents in a
manner that is completely objective. If we say we have our population of 40 people in the diagram
above we can allow more mathematically based methods to determine our selections. There are many
algorithms that allow us to select elements/numbers “randomly". Even a basic scientific calculator
usually comes equipped with some type of random number generator. Thus, in assuming that these
algorithms are randomized we can collect our sample based on their outputs.
Here are two simple ways to run such a random sample. Suppose we wish to collect a sample of 10
individuals from our population of 40, we could perform a:
1. Systematic Sample
In having a list of our total population, we can label the people from 1 to 40 (say in alphabetical
order) and then take every 4th person to obtain our entire sample.
129
2. Simple-Random Sample
In this instance, still assuming weʼve obtained some type of numbered listing of our population,
in this case from 1 to 40, we use a random number generator to select our 10 respondents. For
example, in GoogleSheets we can simply enter the command RANDBETWEEN(1, 40) to generate
a randomly selected integer between 1 and 40 inclusive. Repeating this process 10 times, by
simply copying this initial cell down for nine more iterations of this function will yield us the
desired sample.
Overall, both of these strategies help curb any inherent bias in collecting responses. It is possible that
one group may be under or over represented, but our hopes are that this would be stemmed by the
randomness of the selection process. The difficulty with this approach is that we will not always be able
to generate a list of all members of a given population, and so in such cases we have to employ other
strategies.
Stratified Sampling
It is o en the case that we wish to ensure certain groups are represented within our sample. If these
groupings are categorized by some type of characteristic then these categories are referred to as
strata.
For example, suppose that our population of 40 people comprises 20 biological males and 20 biological
females. In order to ensure that each of these groups or strata are fairly represented, we then would
ensure to select 5 males and 5 females. The sample from each of these strata could be done via any of
the other methods (e.g. convenience, systematic, simple-random, etc..).
Cluster Sampling
130
At times it is unfeasible to reach your entire population, usually because of travel constraints. To
alleviate this issue it is, at times, advisable to randomly select “neighbourhoods" or physical regions to
draw our samples from. Such geographic neighbourhoods are referred to as clusters. For example, our
population of 40 people are located in 5 such clusters la-belled A, B, C, D, and E. We could, at random,
select 2 of these clusters, then further select a representative sample from each cluster to obtain our
overall sample. This would allow us to avoid “travelling" to the other neighbourhoods and so save us
time, and possibly money, in the process.
A great example of cluster sampling would be with the Nielsen-Ratings System. This company
continually monitors television viewing habits. Their conclusions have a great impact on advertising
revenue as shows that garner higher viewership, called “ratings" can demand greater pricing on their
advertising spots. As sampling the entirety of the United States of America would be a costly venture,
the company instead selects a sampling of clusters based on zip-codes (analogous to our postal code
system) to identify neighbourhoods. They then ask potential respondents whether or not they would
allow their viewing habits to be monitored (convenience sampling).
Multi-Stage Sampling
As alluded to earlier, we o en will combine sampling strategies to partition up our population then
sample from the varying partitions. When this is done, we refer to this as multi-staged sampling. The
example about viewer ratings is such a case. The staged breakdown would go as follows:
Stage 1: Randomly select 20 Zip-Codes (Cluster Sampling)
Stage 2: Ask all viewers in a selected zip-code region for permission to capture viewing habits
Example 3.2.1 Our student council wishes to sample a representative group of 60 students on their
opinions regarding social events for the upcoming school year. The student body consists of
131
approximately 1100 students total and is fairly evenly distributed across grade/age level.
Solution: Weʼll employ a multi-staged approach to our selection of students in order to prevent bias in
opinion.
I. Stratify by grade level, selecting 15 students from each grade. (Stratified Sampling)
II. Select two English courses for each grade to sample from. (Cluster Sampling)
III. Systematically select either 7 or 8 students from the class list to participate.
(Systematic/Random Sampling)
⬛
132
3.2 Practice
Studies
The following case studies are based on former thesis projects run in this course. For each question our
population of concern will be indicated:
A. the variables of concern (including how you would measure these variables),
B. design a detailed procedure aimed at collecting raw data.
C. determine a sampling strategy that you wish to employ which acts to minimize selection or
response bias. Assume this would be your own thesis project so this strategy should remain
aware of our constraints of an extremely limited budget (i.e. out of pocket expenses only!!) AND
a time period of approximately 2 weeks maximum.
1. To what extent do people simply “click accept" when asked to read over the conditions and
terms of some type of downloaded so ware or internet site sign-up such as Facebook or
iTunes? (RiveRa er, Fall 2012 )
POPULATION: Canada
2. Does taking grade 10 Civics improve a studentʼs knowledge of the Canadian political system?
(Alice, Spring 2011 )
POPULATION: Woodlands Students
3. Does a personʼs attractiveness rely on oneʼs facial measurements being closer to the golden
ratio? (Emad, Fall 2006 )
POPULATION: Woodlands Students
4. Are the letter distributions of tiles in Scrabble and Words with Friends accurate reflections of the
letter distribution in todayʼs English language? (Arvind, Spring 2018)
POPULATION: English Language (Note: A population doesnʼt have to consist of people as is
demonstrated here)
Repertoire
5. THESIS PROPOSAL: Propose your own thesis topic and outline your proposed experiment
including the variables of concern, procedure, and sampling strategy.
(Note: If your proposal focuses on a survey based experiment, then ensure to watch the video
introduced in the previous section around question design as this will be a major source of how
your procedure is evaluated.)
133
3.3 Measures of Central Tendency
title_section
When examining a set of data, one o en wishes to understand where the centre of the data lies. In a
sense, the centre of the data is the most fundamental characteristic that can be concluded about the
population in question. However, as with many concepts in mathematics, the centre of a data set has
many interpretations. In this book weʼll examine three in particular.
The Arithmetic Mean
The mean of a set of data is o en referred to as the “average". As a student in school, this is a well
known measurement of the centre of a data set. The formula used to determine the mean given a
{ }
sample of data 𝑥1 , 𝑥2 , 𝑥3 , ... , 𝑥𝑛 is given by:
𝑥1 + 𝑥2 + 𝑥3 + ... + 𝑥𝑛 ∑𝑥𝑖
𝑖
𝑥= 𝑛
= 𝑛
(Mean Taken from a Sample)
If we instead obtained a census (collect data from every member of the population of size N), then the
{
notation differs slightly for the set 𝑥1 , 𝑥2 , ... , 𝑥𝑁 : }
𝑥1 + 𝑥2 + 𝑥3 + ... + 𝑥𝑁 ∑𝑥𝑖
𝑖
µ = 𝑁
= 𝑁
(Mean of an Entire Population)
Notice that these two formulae are essentially the same! The semantics of labelling the sample mean
(𝑥)versus a population mean (µ) will become important later on. Remember, the goal of our
experiments is to estimate the population mean, and so this concept carries significance.
Also, the symbol ∑ simply is a shorthand notation meaning to “sum over all” of the numbers indicated
beside it; if you continue studying mathematics beyond this year, this symbol will pop up more o en.
Example 3.3.1 (Shoe Ownership of the Woodlands Senior)

A recent poll of 15 identified girls and 15 identified boys at The Woodlands School asked the
respondents how many pairs of shoes that they owned. A “pair of shoes" constituted any type of shoe
including boots, sandals, slippers, etc... The responses were organized by gender for comparison with
the data shown in the table below:
Boys 7 4 10 9 7 8 5 11 8 8 10 6 4 8 10
Girls 15 18 12 9 16 15 24 10 5 19 18 15 11 8 10
134
Compare the sample means of both sets of data. What conclusions can be inferred from these results?
Solution: Since weʼve only taken a sample of the population (senior students of The Woodlands
School), weʼll notate our results accordingly.
7 + 4 + 10 + ... + 8 + 10
Boys 𝑥𝐵 = 15
= 7. 7
15 + 18 + 12 + ... + 8 + 10
Girls 𝑥𝐺 = 15
= 13. 7
This can be done much easier on a spreadsheet; especially when our data sets become quite large.
Copying our table into Google Sheets we can simply use the “AVERAGE” function to calculate our mean.
The spreadsheet file is linked below:
Google Sheets - Shoes
We can somewhat conclude from this limited evidence that senior girls tend to own more pairs of shoes
than do the boys. Weʼll come back to this set later to run better analysis.
⬛
It is very o en the case that the raw data we have to work with is “grouped" into regular intervals. For
example letʼs observe the following example:
Example 3.3.2 (Push-Up Capabilities of the Suburban Ontario Teenager)

A recent study examined the amount of push-ups students at a local secondary school could complete.
The data recorded is shown below:
# Push-Ups Frequency
1-5 10
6 - 10 14
11 - 15 26
16 - 20 48
21 - 25 40
26 - 30 18
135
31 - 35 11
36 - 40 6
Determine the mean amount of push ups completed by the students surveyed.
Solution:
So the challenge here is that we have to understand the nature of grouped data. In any of the
groupings, for example the “1 to 5 push-ups” group, we donʼt really know the exact number of push-ups
achieved by each of the 10 respondents in that interval. Consequently, our best estimate will assume
the middle value in the range is representative of the interval.
Thus, we can modify our table to include these representative midpoint values as shown below:
# Push-Ups Midpoint Frequency
1-5 3 10
6 - 10 8 14
11 - 15 13 26
16 - 20 18 48
21 - 25 23 40
26 - 30 28 18
31 - 35 33 11
36 - 40 38 6
To calculate the mean, we can think of this as a massive single row list
Thus, the mean can be calculated as follows:
(10)(3) + (14)(8) + (26)(13) + ... + (6)(38)

𝑥= 10 + 14 + 26 + ... + 6
= 19. 42
So the study suggests that suburban Ontario teenagers can complete an average of 19.42 push-ups.
⬛
136
Being the kind soul that your dear writer is, the derivation of a formula for the grouped mean of data will
be le as an exercise for the reader; thatʼs you!
A Note on Grouping Data

There are several rules of etiquette when grouping data to make the process of organizing and
ultimately presenting the data more effective. The rules are as follows:
Rule I. There should only be between 5 to 20 intervals constructed in total. (Lower than five showing as
too crude of a breakdown and greater than twenty showing as too fine of a breakdown.)
Rule II. The widths of each interval should be equal in size. If need be, construct an open-ended final
interval for results that exceed the scope of most of the data.
Rule III. The intervals should be constructed so that no data lies in two separate groupings.
The Median of a Set of Data
A second way of measuring the centre of a set of data is by considering the “positional" centre of the
data set when it is arranged in increasing order. The middle datum is referred to as the median of the
data set. It is defined as follows;
Median =
{ middle datum,
avg of middle two data,
where n represents the number of data in the set.

if n-odd
if n-even
Example 3.3.3 (Lengths of words)

Reading through the words in the above paragraph (including the equation) we have that their lengths
(measured in terms of amount of letters) is given as shown in the table below:
3 6 2 1 2 4 1 6 3 2 9 3 6 2 1 3 2 4 2 2
11 3 10 6 2 3 4 3 4 2 2 8 2 10 5 3 6 6 2 8
2 2 3 6 2 3 4 3 2 2 7 2 7 6 6 5 2 4 3 2
6 3 4 2 5 5 1 10 3 6 2 4 2 3 3
Determine the median word length of the paragraph above.
Solution:
Remembering that to compute the median we have to sort the data in increasing order, we arrange the
137
table in the spreadsheet linked here by sorting the column the data was entered in: Sorted Word List
In total there are 75 words, thus
Median = middle datum (38th datum in the list) = 3
Alternatively, in most spreadsheets there is a MEDIAN function which will compute the median of a set
of data without needing the step of sorting.
Thus the median word length of the paragraph preceding this example was 3 letters in length.
⬛
Alternate Solution:
We could also approach this task by grouping the data. In this case we do not require to devise
intervals, instead weʼll just use the specific word length as a particular group:
Word Length Frequency Cumulative

Frequency
1 4 4
2 23 27
3 16 43
4 8 51
5 4 55
6 11 66
7 2 68
8 2 70
9 1 71
10 3 74
11 1 75
This table also includes the cumulative frequency which counts the number of data at or below the
indicated group. Since weʼre looking for the 38th datum, we scan the cumulative frequency list to see
where the 38th data would have fallen in. Thus we see this would occur in the group that has 3 letters.
Thus, the median word length of the paragraph would be 3 letters long.
⬛
138
The Mode of a Set of Data:
A third standard measure of centre concerned entirely with the frequency of a particular data or interval
(with grouped data) is called the mode or modal interval (for grouped data).
Example 3.3.4 (Word Lengths again...)

Determine the mode of the word length data set from the previous example.
Solution:
Utilizing the frequency table generated in the previous example we easily can see that two letter words
are the most frequently used in the paragraph analyzed. Thus the modal word length of the paragraph
is 2.
⬛
Choosing an Appropriate Measure of Central Tendency
A potentially unnerving aspect to statistical reasoning is that there are o en many options of analytical
strategies to take. As such, we must remain mindful of the strengths and pitfalls of each measure of
central tendency when making our selection. Here are some considerations when making this
determination:
Measure Type Strength Weakness
Mean - extremely useful for concluding - highly influenced by extreme

on large scale data sets (as weʼll data (very large or very small)
see later in this text) which can affect conclusions
Median - is less affected by extreme - limited analytical tools to draw

values as itʼs based on the conclusions from
middle position of the data set
only
Mode - useful when concerned with - very crude measure with no

“popular” opinions/notions; other associated analytical
ordering most common items, measures that can aid in
etc… determining conclusions
- may yield more than one value - USE SPARINGLY!!
(bi-modal, tri-modal, etc..) if
there is a “tie” in the largest
frequency
139
140
Practice 3.3
Technique
1. Determine the mean, median, and mode of the set of data exhibited below. Clicking on the link
will provide access to the google sheet: Arbitrary Data Set
48 15 1 1 43 61 74 2 68 95
86 80 39 45 4 14 53 6 2 39
38 44 80 70 87 9 52 32 12 72
75 28 58 89 43 8 69 46 53 68
83 60 12 73 98 11 29 99 40 34
79 36 4 56 42 22 50 59 89 32
21 2 67 13 79 13 73 86 97 93
54 27 48 52 61 28 48 44 30 86
62 27 46 55 33 97 90 80 18 87
66 92 81 52 17 84 78 42 54 34
2. Referring to the data set above, group the set into intervals ensuring to follow the “Notes for
Grouping Data” outlined in the readings. Then compute the mean, median, and modal
interval of your grouped data set.
3. (Texting Habits of GTA Millennials) A study examining the number of text messages sent by a
group of 100 GTA residents was tabulated yielding the following results:
# Texts Frequency
0 - 99 23
100 - 199 32
200 - 299 12
300 - 399 15
400 - 499 7
≥ 500 11
Determine the mean, median, and modal interval of this sample.
4. The Ontario secondary school report card will show a studentʼs percentage grade in a particular
course along with the course median at the school, along with assessments of the learning
skills. Explain why the course median is used as the measure of centre of the achievement as
opposed to the mean or mode.
141
3.4 Measuring Spread: Means
title_section
So far in our explorations we have only considered differing methods for determining the “centre" of a
set of data. Interestingly, you may have noticed that there were many paradigms by which the centre
could be defined by, for example, based on an arithmetic balance (mean), positional centre (median), or
most frequently occurrence (mode). Contrary to popular belief there is great freedom in mathematics to
define anything that we wish! Consequently we are able to devise rules and elaborate and expand upon
these definitions. Thus, mathematics is not a static subject devoid of creativity as many would think or
lead you to believe. Rather, it is the practicality (or at times just continued curiosity) of a defined model
or system that promotes the concept to be developed further. Weʼll see in this section a perfect
example of this philosophy.
As stated earlier, simply measuring the centre of a set of data is very limiting in the type of information
that can be extracted and inferred. Letʼs work through an example that illustrates the lack of
conclusiveness that arises.
Example 3.4.1 A class of students file a grievance to their student union citing unfair grading practices
of their professor, weʼll call him Professor X. In the subsequent investigation by the school, Professor X
states that his course mean is comparable to Professor Yʼs class which has incurred no such grievance,
thus feeling there is not an adequate cause for complaint. Below are the final marks for each class listed
in increasing order:
Prof X 38 48 52 55 57 65 65 69 72 85 89 96 96 99 99
Prof Y 62 65 65 68 70 70 72 73 76 78 83 85 87 90 94
You have been placed in the role of the ombudsman (arbitrator) for this case. Do the students have a
cause for concern? Justify your position.
Solution:
At times weʼll find that real-life case studies will yield arguments that can be framed in such a way that it
is difficult to determine a clear conclusion. When framing your argument/perspective itʼs imperative
that we stay objective with our conclusions and avoid any preconceived biasʼ that may enter into our
judgements based on personal or related experiences. For example, being a student yourself, you may
feel that some instructors have been unnecessarily punitive or unfair in their grading practices, and so,
you may be “searching” for evidence to frame the issue in the studentsʼ favour. However, taking the role
of the ombudsman in this case weʼll examine the evidence as presented. As such, without any
anecdotal evidence presented to us, our only recourse is to examine the final grades themselves.
At this point, we only have measures of central tendency, so weʼll examine these first:
Mean: 𝑥𝑃𝑟𝑜𝑓 𝑋 = 72. 3%, 𝑥𝑃𝑟𝑜𝑓 𝑌 = 75. 9%
142
Median: MedianProf X = 69% MedianProf Y = 73%
For both measures we see that the students in Prof Yʼs class are a little higher, but the claim that the
averages are comparable is apt. Thus, letʼs examine a secondary measure to dive a little deeper into the
issue. The dot-plot below shows the final standing of each student in the course.
The diagram somewhat illuminates the reason for the studentʼs cause for concern. We can see that the
results from Professor Xʼs class are much more varied, with some students achieving really poorly while
others exceedingly well. Conversely, Professor Yʼs studentʼs seem to achieve at a much more consistent
rate, though there werenʼt any students achieving at the extreme levels that Professor Xʼs class had
yielded.
Having no other contextual information to go by, we are le to only conclude that Professor Xʼs results
are more varied. This may be a result of many factors which may or may not be an issue with the
professorʼs teaching methods. As such, a more involved questioning into the practices of each
instructor will need to be examined.
⬛
The above discussion highlighted a second way of observing a set of data, namely that of measuring the
spread or variation. Intuitively, we are measuring how consistent the individual data adheres to the
chosen measure of centre. We set off now to exhibit differing measures of spread that are commonly
employed.
The Mean Deviation
Letʼs suppose that we have a set of data shown once again as a dot plot in the diagram below;
143
The arithmetic distance that an individual datum strays from the mean is termed a deviation. The
mean deviation is simply the average distance that the individual data strays from the mean. It is given
by the following relationship:
∑|||𝑥𝑖 − 𝑥|||
𝑖
𝑀𝑒𝑎𝑛 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑛
You may have noticed that the absolute value of the deviations are employed here. This is due to the
property that the arithmetic mean, 𝑥, has the characteristic that the total of all deviations is equal to
zero! Thus, the total deviation to the right of 𝑥 is equal to the deviation to the le of 𝑥. To avoid this
issue, when determining the mean deviation we simply consider the absolute distance so that all
measurements are positive.
Example 3.4.2 A TV host runs a quick convenience sample of 10 GO train riders and 10 subway riders
comparing the amounts of actual cash carried on their person.
The results of the sample are exhibited below:
GO 40.00 26.34 0.00 100.00 13.72 20.00 80.00 0.00 27.25 112.56
Subway 12.15 22.44 17.00 28.00 0.00 85.65 17.00 0.00 42.65 12.50
Compare the amounts of money that the two types of transit riders carry.
Solution:
Once again we have to be aware that we can only really compare the measure of centre for each set, in
this case weʼll employ the mean, and how spread out each set is, using the mean deviation in this case.
Beyond that we can only conjecture about possible explanations for the results.
GO Riders: 𝑥𝐺 = $41. 99
Subway Riders: 𝑥𝑆 = $23. 74
144
In order to determine the mean deviation we can check if our spreadsheet so ware includes this
function. Most donʼt (including Google Sheets) as this isnʼt a commonly used measure. So, in this case
we have to devise a slightly more “manual” input into a spreadsheet or our calculator. Our calculation
looks like the expressions that follow:
|40.00−41.99|+|26.34−41.99|+...+|112.56−41.99|
𝑀𝑒𝑎𝑛 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝐺
= 10
= $33. 32
|12.15−23.74|+|22.44−23.74|+...+|12.50−23.74|
𝑀𝑒𝑎𝑛 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑆
= 10
= $17. 02
On a spreadsheet, we can create a new column/row that sits beside the data set in order to calculate
each deviation, then determine the mean of the deviation column to get this measure. Such a table is
shown below:
GO Deviations Subway Deviations
40 1.99 “=abs(A2 - 41.99)” 12.15 11.59
26.34 15.65 22.44 1.3
0 41.99 17 16.74
100 58.01 28 4.26
13.72 28.27 0 23.74
20 21.99 0 23.74
80 38.01 17 6.74
0 41.99 0 23.74
27.25 14.74 42.65 18.91
112.56 70.57 12.5 11.24
Mean Deviation 33.321 “=average(B2:B11)” 17.017
The highlighted regions indicate the instruction used in the spreadsheet.
Thus we can infer from these results that the mean amount of money that subway riders carry on their
person is lower (nearly half that of GO train riders) and that they are less varied (more consistent) in
the amounts that they carry as well.
⬛
145
Variance and The Standard Deviation
A second way to measure spread using the mean as the measure of central tendency is the concept of
variance. Once again, we consider the deviations from the mean 𝑥, we avoid the fact that the total
deviation is zero by squaring each individual deviation thus forcing a positive result. In taking the
average of the squared deviations we come up with what is termed the variance of the set of data whose
formula is shown below:
2
2
(
∑ 𝑥𝑖 − 𝑥
𝑖
)
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = σ = 𝑛
Example 3.4.3 (GO Riders Revisited) Determine the variance in the amount of money that GO train
riders carry with them while riding as compared to the sample of Subway riders.
Solution:
Using the data set from example 3.4.2 we once again use a spreadsheet to help us calculate the variance
of the cash on hand of GO train riders. The main difference here is that we will take the square of the
deviations instead of the absolute value, and then average these squared values.
GO Deviations Squared Subway Deviations Squared
40 3.9601 “=(A2 - 41.99)^2” 12.15 134.3281
26.34 244.9225 22.44 1.69
0 1763.1601 17 45.4276
100 3365.1601 28 18.1476
13.72 799.1929 0 563.5876
20 483.5601 0 3832.8481
80 1444.7601 17 45.4276
0 1763.1601 0 563.5876
27.25 217.2676 42.65 357.5881
112.56 4980.1249 12.5 126.3376
Variance 1506.52685 568.89699

“=average(B2:B11)”
146
Once again, the results indicate that the amount of cash on hand for subway riders is more consistent
(less varied) than those that ride on the GO train.
Observing this measure, you may realize that the results will still act as a decent comparative tool when
trying to determine if a set of data is more or less spread out than another. The downside of the
variance measure is that it doesnʼt give the same sense of an “average" distance an individual datum
strays from its mean. We can somewhat rectify this by taking the square root of the sum of squares, this
measure is called the standard deviation and is given by the following formulae:
2
(
∑ 𝑥𝑖 − 𝑥
𝑖
)
(Population) 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = σ = 𝑁
2
(
∑ 𝑥𝑖 − 𝑥
𝑖
)
(Sample)* 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑠 = 𝑛−1
*(The “n - 1" for the standard deviation for a sample is a correction due to consistent underestimation of
the standard deviation when a sample is taken instead of the full population. This correction, known as
Besselʼs correction, will be investigated in the Repertoire problems for those that are interested.)
If we look at the standard deviation of the GO-train riders and subway riders we get;
GO-riders 𝑠𝐺 = 40. 913

Subway-riders 𝑠𝑆 = 25. 1417
These results line up somewhat better with the mean deviation calculated earlier, which was
determined to be $33.32 and $17.02 respectively, but still a dramatically different result from the
“average distance" the data stray from the mean.
Surprisingly, this is the measure used much more consistently in practice! The reason for this is a
remarkable result known as Chebyshevʼs Inequality which weʼll paraphrase here:
Theorem 3.4.1: (Chebyshevʼs Inequality) Given any sample of data, with mean 𝑥and standard
deviation s, then at least
(1 − ) ;𝑘
1
2 𝑘 > 1
of the data will lie within k standard deviations of the mean.
147
Proof: (Unfortunately, the proof of this result must be le alone for a few years as we require methods
and techniques learned in Calculus to understand the argument.)
Letʼs observe this result “in action". Once again, if we use our GO train data we have that the average
money carried by a rider was $41.99, with a standard deviation of $40.91. Thus, Chebyshevʼs Inequality
would predict that at least 75% of the data set will fall within 2 standard deviations of the mean, this
range is given by
41. 99 ± 2 · 40. 91
yielding the interval (− 39. 83, 123. 81).
Now, this doesnʼt prove all that interesting as all of the data did fall within this interval, however
Chebyshevʼs Inequality guaranteed that at least 75% of the data would fall within this range! As such,
this result turns out to be one of the underpinnings of an extremely powerful predictive tool used in
statistical reasoning, and consequently the standard deviation has become the favoured measure of
spread for the arithmetic mean.
148
Practice 3.4
Technique
1. Determine the mean deviation, variance, and standard deviation of the set of data exhibited
below. Clicking on the link will provide access to the google sheet:
48 15 1 1 43 61 74 2 68 95
86 80 39 45 4 14 53 6 2 39
38 44 80 70 87 9 52 32 12 72
75 28 58 89 43 8 69 46 53 68
83 60 12 73 98 11 29 99 40 34
79 36 4 56 42 22 50 59 89 32
21 2 67 13 79 13 73 86 97 93
54 27 48 52 61 28 48 44 30 86
62 27 46 55 33 97 90 80 18 87
66 92 81 52 17 84 78 42 54 34
2. Develop a spreadsheet method for determining the mean deviation, variance, and standard
deviation of a grouped set of data. Employ your strategy on the data set given below:
Age # of Staff
20 - 30 634
30 - 40 1852
40 - 50 1720
50 - 60 415
60+ 38
3. Thirty students in an experimental psychology class use various techniques to train a rat to
move through a maze. At the end of the course, each studentʼs rat is timed as it negotiates the
maze. The results (in minutes) are listed below. Determine the fraction of the 30 measurements
that fall within:
1 standard deviation of the mean,
2 standard deviations of the mean,
3 standard deviations of the mean.
1.97 0.60 4.02 3.20 1.15 6.06 4.44 2.02 3.37 3.65
1.74 2.75 3.81 9.70 8.29 5.63 5.21 4.55 7.60 3.16
3.77 5.36 1.06 1.71 2.47 4.25 1.93 5.15 2.06 1.65
149
Do the results agree with Chebyshevʼs Inequality?
4. An individualʼs commuting time to work has a mean of 73 minutes and a standard deviation of
12 minutes. How late could the person leave home to be sure of arriving to work on time at least
80 percent of the time?
5. Clinical observations suggest that specifically language-impaired children have great difficulty
with the proper use of pronouns. This phenomenon was investigated and reported in the
Journal of Communication Disorders (March 1995). Thirty children, all from low- income
families, participated in the study. Ten were five year old children which were identified as
specifically language impaired (SLI), ten children were younger and normally developing (YND),
and 10 were five year old normally developing (OND). The table below contains the gender,
intelligence quotient (IQ), and percentage of pronoun errors observed for each of the 20 (SLI)
and (YND) subjects:
Pronoun
Subject Gender Group IQ Errors(%)
1 F YND 110 94.4
2 F YND 92 19.05
3 F YND 92 62.5
4 M YND 100 18.75
5 F YND 86 0
6 F YND 105 55
7 F YND 90 100
8 M YND 96 86.67
9 M YND 90 32.43
10 F YND 92 0
11 F SLI 86 60
12 M SLI 86 40
13 M SLI 94 31.58
14 M SLI 98 66.67
15 F SLI 89 42.86
16 F SLI 84 27.27
17 M SLI 110 33.33
18 F SLI 107 0
19 F SLI 87 0
20 M SLI 95 0
a. Determine the mean and standard deviation of the percentage of pronoun errors for the
children who are specifically language impaired (SLI).
b. Compare the results from part (a) with the mean and standard deviation to the
150
percentage of pronoun errors for children who are younger and normally developing
(YND). What conclusions can be inferred by the results?
Studies
6. Consider the data set: 4 3 7 3 2
a. Determine the standard deviation of this set of data.
b. Add five to each member of the data set and then compute the standard deviation once
again.
c. Now add seven to each member of the data set and compute the standard deviation.
d. Use the results when comparing the three sets of standard deviations to conjecture the
effect of adding a constant k to each member of a set of data.
e. Attempt to prove this conjecture for any data set.
7. The z-score of a datum is defined as the number of standard deviations the datum strays away
from the mean, 𝑥. It is given by the formula,
𝑥−𝑥
𝑧 = 𝑠
where x represents the datum in question, 𝑥 and s are the mean and standard deviation of the
data set respectively.
Consider the data set given in problem 1 from this section.

a. Compute the z-score for each datum in this set. This will create a sort of “parallel” data
set of z-scores that is related to the original set.
b. Determine the mean and standard deviation of the set of z-scores. What do you
notice?
c. Explain why, or prove, why the mean and standard deviation will always end up being
the result that you got in part (b).
151
3.5 Measuring Spread: Medians
title_section
Quartiles
The notion of the spread of a set of data is not simply constrained to using the mean as the measure of
central tendency. If we choose instead to employ the median as our measure, a different tack must be
employed. To begin with letʼs remind ourselves that the median measures the positional centre of the
data set. Thus, when devising a measure of spread using medians weʼll continue to employ a positional
approach. Thinking in this manner we can furthermore recognize that in order to determine the
positional difference of a datum in our set relative to the median, we simply take the median of the
lower half (First Quartile) and median of the upper half of the data set (3rd Quartile) to gauge this
spread. Letʼs observe an example to highlight how this is done.
Example 3.5.1 The following set of data represents the gas consumption (in litres per 100 km) of a new
model car by Peel Motors, The Woodlander, measured under specific test conditions. Report a reliable
range of the gas consumption that this car yields.
12.3 11.7 11.9 12.1 13.2 12.9 11.5 12.3 12.2

L/100 km of The Woodlander, Peel Motors
Solution: Not wishing any extraneous results to sway our estimations we decide to use the median as
our measure of central tendency. Using the sort feature in Google Sheets, we arrange the data from
lowest to highest resulting in the following set:
Upper-Half of the Data Set (includes Median)
11.5 11.7 11.9 12.1 12.2 12.3 12.3 12.9 13.2

Q1 Median Q3
Lower-Half of the Data Set (includes Median)
As there are only nine test results we can quickly determine that our median result is given by;
Median = 12.2 L/100 km
To get our range, we observe the median of the lower-half of the data set
Lower Median = Q1 = 11.9 L/100 km (1st Quartile)
the median of the upper-half of the data set (3rd Quartile) is

Higher Median = Q3 = 12.3 L/100 km (3rd Quartile)
152
By the very nature of how the quartiles are measured, we know that at least 50% of the data resides
within the bounds of the 1st quartile to the 3rd quartile. This gives rise to the range that weʼre hunting
for, specifically that this vehicle reliably scores a fuel consumption rating between 11.9 to 12.3 L/100
km.
⬛
The Interquartile Range (IQR)
Building to our measure of spread for a data set whose median is chosen as the measure of central
tendency. We wish to get a sense for the size of the spread. To do this, we employ a relatively simple
measure, the interquartile range (IQR). This measure simply takes the “distance” between the first
and third quartiles;
𝐼𝑄𝑅 = 𝑄3 − 𝑄 1
This quantity measures the range that incorporates half of the data.
To get a sense of the “typical” distance the data strays from the median, we can use the
Semi-interquartile range, which is simply found by dividing the IQR by two.
For example, for the example shown above, we get;

𝐼𝑄𝑅 = 12. 3 − 11. 9 = 0. 4 𝐿/100 𝑘𝑚
and the semi-interquartile range

𝐼𝑄𝑅
𝑠𝑒𝑚𝑖 − 𝐼𝑄𝑅 = 2
= 0. 2 𝐿/100 𝑘𝑚
As mentioned in the previous section, these measures arenʼt used frequently for statistical analysis, but
they do have some value especially in terms of visualizing spread and detection of outliers. This can be
helpful in “pre-analysis” of collected data from an experiment.
Box-Whisker Plots (Visualizing Spread)
The IQR is the accepted measure of spread when utilizing medians as the measure of central tendency,
however it is rare that we make conclusions of the nature described in the example above. Instead, we
generally display the results graphically using what is called a box and whisker plot. Letʼs once again
use an example to illuminate these techniques.
Example 3.5.2 (The GO-train riders strike again…) In example 3.4.2 we examined the amount of cash
that GO train riders and subway commuters carried on their person. Compare the data once again, this
time using medians as the measure of central tendency.
Solution: Since weʼre using medians for our analysis, letʼs display the data sets in order:
153
GO 0.00 0.00 13.72 20.00 26.34 27.25 40.00 80.00 100.00 112.56
Subway 0.00 0.00 12.15 12.50 17.00 17.00 22.44 28.00 42.65 85.65
We summarize the measures as follows:
GO Subway
Lowest Datum 0.00 0.00
Q1 13.72 12.15
Median 26.80 17.00
Q3 80.00 28.00
Highest Datum 112.56 85.65

*NOTE: Google Sheets seems to be calculating the quartiles of even sets of numbers improperly.
We now move to creating comparative box plots to visualize the spreads of each data set. On Google
Sheets this chart is candlestick plot and is shown below:
We can now visualize and conclude upon the differences in how the riderships of each type of transit
tend to carry cash. Specifically, GO-Riders tend to carry larger amounts and more varied amounts of
cash.
Conversely, we could conclude that Subway riders tend to consistently carry less cash on hand.
⬛
154
These plots are structured as shown, with
the base of the “stem/whisker” indicating the lowest datum,
the base of the “box” indicating the 1st Quartile,
the top of the “box” indicating the 3rd Quartile, and
the top of the “stem/whisker” indicating the highest datum.
NOTE: Box plots can also be displayed with a horizontal orientation,

this largely depends on the so ware or aesthetics that you choose to
present with.
We can also indicate the median by placing a horizontal line within the “box” to show where the median lies
within the middle 50% of the data. To do this, we would have to manually draw in this line onto the chart.
The YouTube video linked below demonstrates how one can create a
box-plot on Google Sheets:
Amy Hickey - How to Make a Boxplot on Google Sheets
Detection of Outliers
When performing a statistical experiment, we at times encounter what seem to be highly unusual
results that stray from the observed trends, no matter which measure of centre is chosen. Such results
are called outliers. In general, outliers usually stem from one of the following causes:
1. The measurement is observed, recorded, or entered incorrectly.
155
2. The measurement comes from a different population than the one desired in the study.
3. The measurement is correct, but represents a rare event.
The question then becomes, “how do we objectively determine whether a measurement is an outlier or
not?" In the world of statistics a commonly accepted practice is to use the IQR as a guide to create a
boundary of acceptable data. The boundaries are referred to as outlier fences. The outlier fences are
given by the following formulae;
𝐿𝑜𝑤𝑒𝑟 𝑂𝑢𝑡𝑙𝑖𝑒𝑟 𝐹𝑒𝑛𝑐𝑒 = 𝑄1 − (1. 5 × 𝐼𝑄𝑅)
𝑈𝑝𝑝𝑒𝑟 𝑂𝑢𝑡𝑙𝑖𝑒𝑟 𝐹𝑒𝑛𝑐𝑒 = 𝑄3 + (1. 5 × 𝐼𝑄𝑅)
Graphically, these regions are depicted as follows:
NOTE: This is an example of a horizontal box-plot, the same principles will hold with vertically designed plots.
If outliers are detected it is good practice to run through the following process:
STEP 1: Attempt to determine the nature of the outlier (measurement error, outside of population,
extraneous event).
STEP 2: Decide whether or not to remove the outlier. For example, if the datum is based on a
measurement error while the others arenʼt, then state your findings in the report and proceed
by analyzing without the outlier.
156
Practice 3.5
Technique
1. A Ph.D. student in psychology conducted a stimulus reaction experiment as a part of their
dissertation research. They subjected 20 students to a threatening stimulus and 20 to a
non-threatening stimulus. The reaction times of all 100 students, recorded to the nearest tenth
of a second, are listed below.
Non-Threatening Stimulus
2 1.8 2.3 2.1 2 2.2 2.1 2.2 2.1 2.1
2 2 1.8 1.9 2.2 2 2.2 2.4 2.1 2
Threatening Stimulus
1.8 1.7 1.4 2.1 1.3 1.5 1.6 1.8 1.5 1.4
1.4 2 1.5 1.8 1.4 1.7 1.7 1.7 1.4 2.5
Compare the two groups using the median as your measure of spread. Include a box-plot with
your analysis (indicating outliers, if present).
2. Facing accusations of discriminatory pay scales between male and female employees, a survey
by the Ministry of Labour was commissioned. Given the following data, compare the pay scales
by gender (including non-identified) using the median as the measure of central tendency.
Would your conclusions be different if the mean were used instead? Can we adequately
conclude for the non-identified category given the data shown?
Salary (in thousands) Male Employees Female Employees Non-Identified
4.5 - 9.5 3 5 0
9.5 - 14.5 8 7 1
14.5 - 19.5 7 10 0
19.5 - 24.5 5 3 0
24.5 - 29.5 4 2 0
29.5 - 34.5 6 1 2
34.5 - 39.5 4 0 0
39.5 - 44.5 3 0 0
157
158
3.6 Scatterplots and Correlation
title_section
Up until now weʼve delved into many ways of analyzing and interpreting the results of a single set of
data whereby we are estimating the trends in the measurement being taken. If we are considering the
relationship between two variables, a second type of question that o en arises could be generally
stated as follows:
Does a change in Variable X affect changes in Variable Y?
To begin answering this question, we typically run an experiment whereby we take observations of both
variables and plot each observation (X, Y) as points on a scatterplot.
Now, youʼve been investigating this question in one manner or another for most of your secondary
studies in mathematics. Such relationships are geometrically studied in the branch of mathematics
known as analysis. Thus, all of the function work you have done has, in part, prepared you to properly
analyze these types of relationships.
The variable, X, in our primary question above is referred to as the “independent" variable, while Y is
termed the “dependent" variable. In a broad sense there are only three possible conclusions which can
be drawn when asking about the dependency of one variable upon another, these are as follows:
159
One can think that the more “cloudy” the data appears, then the less correlated the variables are. The
conclusions (answer to the question above) that is typically determined under each condition is as
follows:
Strength of Correlation Conclusions
Strong - Variable Y is mathematically affected by

changes in Variable X
Moderate - Variable Y is somewhat affected by changes

in Variable X;
- there are likely other factors that affect both
variables
Weak - Variable Y is unaffected by changes in

Variable X
Up until now you most likely have determined the strength of the correlation between two variables by
simply examining the scatterplot. But how can we be sure which of these three categories our
relationship falls under? The scatterplot alone cannot be sufficient, for if we manipulate the scales of
the plot we could easily make a strong correlation appear weak and vice versa. Thus, our task is to
devise a more objective measurement of the strength of correlation between two variables.
Covariance: Measuring Spread in 2-Dimensions
Considering the possible dependence between

two variables, or lack thereof, you may have
come to the realization that weʼve effectively
“jumped" a dimension. In this light, letʼs
attempt to see how we can extend the concepts
of a measure of centre and spread that can be
redefined in a two-dimensional paradigm.
Consider the scatterplot shown to the right:
160
Considering the x and y coordinates separately we can and the respective means of both sets of data,
namely 𝑥 and 𝑦. The location of (𝑥 , 𝑦) then can be thought of as the two dimensional centre of this set
of data.
To observe a measure of spread we must realize that in two dimensions weʼll be considering areas as
opposed to distances, thus to see how far a single point on our scatterplot “deviates" from the centre
weʼll consider the area of the rectangle formed by tracing horizontally and vertically from our centre
point(𝑥 , 𝑦) to the data point in question (x, y) as shown:
So to algebraically develop our 2D measure of spread express the area of this rectangle as
2𝐷 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = (𝑥 − 𝑥)(𝑦 − 𝑦)
Considering the sum of all such “deviations” we come to the idea of covariance defined by:
( )(
∑ 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑖
)
𝑠𝑋𝑌 = 𝑛−1
You may notice that we can achieve a negative “deviation" and consequently a negative covariance if
many of our points fall in the 2nd or 4th quadrants relative to the statistical centre (𝑥 , 𝑦). This will
prove useful in determining if our relationship is negatively correlated (i.e. a negatively sloped trend) as
opposed to positively correlated (positively sloped trend).
161
The Correlation Coefficient (r)
We now return to our original problem of objectively measuring how well a dependent variable Y is
affected by changes in an independent variable X. One answer turns out to consider the ratio of the
covariance (2D measure of spread) as compared to the product of the standard deviations of x and y
respectively. This measurement is known as Pearsonʼs Correlation Coefficient and is given by the
relationship;
𝑠𝑋𝑌
𝑟 = 𝑠𝑋 · 𝑠𝑌
; where sX and sY are the standard deviations of X and Y
respectively.
This measure is highly practical as it always lies within a range between -1 and 1, with the two extremes
representing a perfect correlation between the two variables (i.e. the scatter plot would result in a
perfect mathematical line). Conversely, if r = 0 then there is absolutely no correlation at all between the
two variables. In determining the distinctions between strong, moderate, and weak correlations we can
use the scale shown below:
Example 3.6.1 A study comparing the number of arrests in relation to the duration of a “sit-in protest"
was performed. Determine the strength of the correlation given the data shown below:
University Duration (days) Arrests
Wisconsin 4 54
Albany 1 11
Oregon 3 14
Iowa 4 16
Kentucky 1 12
162
Solution:
Weʼll make the fairly safe assumption that our independent variable is the duration of the sit-in. Thus,
plotting the scatter plot on a spreadsheet we get the following graph;
This plot appears to show little to no

correlation between these two
variables, however to ensure this
determination we calculate the
correlation coefficient:
𝑟 = 0. 601
Surprisingly, this yields to the

conclusion that the number of arrests
shows a moderate positive
correlation when compared to the
duration of the protest. This suggests
that there are likely other factors that
are affecting both variables.
⬛
163
Practice 3.6
Technique
1. In the state of Florida, elementary school performance is based on the average score obtained
by students on a standardized exam, called the Florida Comprehensive Assessment Test (FCAT).
An analysis of the link between FCAT scores and sociodemographic factors was published in the
Journal of Educational and Behavioural Statistics (Spring 2004). Data on average math and
reading FCAT scores of third graders, as well as the percentage of students below the poverty
level for 22 elementary schools are listed below with the link that follows: FCAT Scores
% Below
FCAT-Math FCAT-Reading Poverty
166.4 165 91.7
159.6 157.2 90.2
159.1 164.4 86
155.5 162.4 83.9
164.3 162.5 80.4
169.8 164.9 76.5

155.7 162 76
165.2 165 75.8
175.4 173.7 75.6

178.1 171 75
167.1 169.4 74.7

177 172.9 63.2
174.2 172.7 52.9

175.6 174.9 48.5
170.8 174.8 39.1

175.1 170.1 38.4
182.8 181.4 34.3

180.3 180.6 30.3
178.8 178 30.3
181.4 175.9 29.6
182.8 181.6 26.5

186.1 183.8 13.8
a. Draw scatterplots of the (math scores) versus (% Poverty), and (reading scores) versus (%
Poverty).
b. Determine whether or not the schoolʼs overall achievement in mathematics is a affected by the
proportion of the school population below the poverty line.
c. Determine whether or not the schoolʼs overall achievement in reading is affected by the
proportion of the school population below the poverty line.
164
2. Neuroscientists at University College of London investigated the relationship between brain
activity and pain-related empathy in persons who watch others in pain. Sixteen couples
participated in the experiment. One partner (observer) watched while painful stimulation was
applied to the finger of the other partner. Two variables were measured for each observer, the
brain activity (measured on a scale ranging from -2 to 2) and a score on the Empathic Concern
Scale (0 to 25 points). The data is listed below. Does this hypothesized relationship exist?
Determine the strength of the correlation.
Empathic Concern
Empathic
Brain Activity Concern
0.5 12
-0.3 13
0.12 14
0.2 16
0.35 16
0 17
0.26 17
0.5 18
0.2 18
0.21 18
0.45 19
0.3 20
0.2 21
0.22 22
0.76 23
0.35 24
Studies
3. Show why the correlation coefficient, r, is bounded by -1 and 1.
4. a. Create a set of data which yields the same y-value every time. Determine the strength of the
correlation of this data set.
b. Use the result you got in part (a) and generalize why this result makes sense, statistically
speaking.
165
166
3.7 Linear Regression
title_section
In the previous section, we determined a method to measure the strength of the correlation between
two variables. Once this strength is determined we usually progress in two ways:
1. Use our conclusions of a strong, moderate, or weak correlation and act accordingly to the
possible consequences of this observation.
2. Produce a model of the relationship, usually a functional model, and use this model for
predictive purposes.
The process for devising a functional model to describe the relationship between two variables is
known as regression. In this section, the only functional model weʼll consider is a linear model, hence
the phrase linear regression. Letʼs examine how this process evolves.
The Least-Squares Line
When constructing our “line of best fit" we have to keep in mind that our objective is to predict the
OUTPUT of the dependent variable Y given a particular INPUT of the independent variable X. As such,
when we draw in our trendline we are hoping to minimize the vertical distances between the
y-coordinates of the raw data to the corresponding point on the line itself with the same x-coordinate
(see the figure below).
167
The issue we have with this approach is that in developing an algebraic approach to solve the problem
of minimizing these distances weʼll have to work with absolute values to eliminate “negative distances".
As many of you are familiar in working with absolute value functions, you will also realize that these
functions are not overly easy to analyze. Thus to circumvent this problem we instead look to minimize
the squares of the vertical distances (as shown below).
168
The analytical methods to determine this least-squares line are somewhat beyond the scope of this
course and so for those interested you may wish to read up on the derivation. This line turns out to
have the form;
𝑌 = 𝑎𝑋 + 𝑏;
where,
𝑎 =
( ) ( )( )
𝑛 ∑𝑥𝑦 − ∑𝑥 ∑𝑦
2
( )()
𝑛 ∑𝑥
2
− ∑𝑥
and,
𝑏 = 𝑦 − 𝑎𝑥
Donʼt fret, you wonʼt have to know these equations at all. All that you need to understand at this point
is that these coefficients for the slope and y-intercept act to minimize the squares of the deviations of
the data points from the regression line.
Example 3.7.1 Legalized gambling is available on several riverboat casinos operated by a city in
Mississippi. The mayor of the city wants to know the correlation between the number of casino
employees and the yearly crime rate. The records for the years starting in 1998 through 2007 were
examined.
169
Year Casino Employees Crime Rate
(thousands) (incidents/1000 people)
1998 15 1.35
1999 18 1.63
2000 24 2.33
2001 22 2.41
2002 25 2.63
2003 29 2.93
2004 30 3.41
2005 32 3.26
2006 35 3.63
2007 38 4.15
A new bill is being proposed that would make it much easier to start up a riverboat casino. It is
estimated that the number of employees working at casinos would double. Predict the crime rate in
this city if the bill were to pass.
Solution:
We enter our data into a
spreadsheet to examine the
strength of the correlation and
potentially use regression
analysis to predict the crime
rate. It should be noted that
the year (or time regression)
wonʼt be of concern for this
analysis as we'll utilize the
Crime rate as the dependent
variable and the Employee rate
as the independent variable.
We get a correlation
coefficient of 𝑟 = 0. 987 indicating a strong correlation. The line of best fit yields the equation
𝐶 = 0. 118 · 𝐸 − 0. 398
found using the “trendline” option when developing the scatter plot on Google Sheets.
Thus, if the employment rate doubles due to this proposal to E = 72 (based on the 2007 result), we
predict that;
𝐶 = 0. 118(76) − 0. 398 = 8. 57
170
indicating a predicted crime rate of approximately 8 to 9 incidents per 1000 members in the population;
which is double the rate quoted for 2007.
⬛
NOTE: It is not always prudent to simply rely on what are seemingly strong mathematical conclusions.
For example, in the solution presented above you may have noticed that if there were no casino
employees our “strongly correlated" least squares line would predict a negative crime rate!! Of course
this is absurd. Moreover, a strong correlation coefficient does not directly imply that a correlation
indeed exists between the two variables being examined, rather they are mathematically correlated
which of course is removed from any context. Thus, we have to always be mindful of the validity of our
experimental procedure as well as any unintended biases that we may bring to the analysis of the topic
at hand.
171
Practice 3.7
Technique
1. The fertility rate of a country is defined as the number of children a female citizen bears, on
average, in her lifetime. Scientific American (Dec. 1993) reported on the declining fertility rate in
developing countries. The study measured the fertility rate as it related to the contraceptive
prevalence (measured as the percentage of married women who use contraception). The data is
shown below: Fertility Rates of Developing Countries
Country Contraceptive Prevalence Fertility Rate

Mauritius 76 2.2
Thailand 69 2.3
Columbia 66 2.9
Costa Rica 71 3.5
Sri Lanka 63 2.7
Turkey 62 3.4
Peru 60 3.5
Mexico 55 4
Jamaica 55 2.9
Indonesia 50 3.1
Tunisia 51 4.3
El Savador 48 4.5
Morocco 42 4
Zimbabwe 46 5.4
Egypt 40 4.5
Bangladesh 40 5.5
Botswana 35 4.8
Jordan 35 5.5
Kenya 28 6.5
Guatemala 24 5.5
Cameroon 16 5.8
Ghana 14 6
Pakistan 13 5
Senegal 13 6.5
Sudan 10 4.8
Yemen 9 7
Nigeria 7 5.7
172
a. Explain why the contraceptive rates were examined for married females only. Does this
choice affect the reliability of the conclusions from our analysis?
b. Determine the strength of the correlation between the contraceptive prevalence in
married women and their respective fertility rates.
c. Determine the least squares line for this data and use it to predict the fertility rate of
women in Canada whose contraceptive prevalence has been measured at 87%. Is this
prediction reliable? Explain why or why not.
2. At temperatures approaching absolute zero ( -273 oC or 0 Kelvin), helium exhibits traits that
seem to defy many laws of Newtonian physics. An experiment has been conducted with helium
in solid form at various temperatures near absolute zero. The solid helium is placed in a dilution
refrigerator along with a solid impure substance, and the fraction (in weight) of the impurity
passing through the solid helium is recorded. (This phenomenon of solids passing directly
through solids is known as quantum tunneling.) The data is given in the table below:
Quantum Tunnelling
Proportion of
Temperature Impurity
-262 0.315
-265 0.202
-256 0.204
-267 0.62
-270 0.715
-272 0.935
-272.4 0.957
-272.7 0.906
-272.8 0.985
-272.9 0.987
a. Determine the strength of the correlation between these two variables.

b. Determine the least-squares estimate of the slope and intercept. What do these
measures imply for this phenomenon?
173
3. (Global Population) The following data set compares the overall U.S. population (in
millions of people) to the calendar year.
Year Population (millions of people) Year Population (millions of people)

1990 249.5 1995 262.8
1991 252.2 1996 265.2
1992 255.0 1997 267.8
1993 257.8 1998 270.2
1994 260.3 1999 272.7
a. Classify the strength of the correlation between these two variables.

b. Sketch a line of best fit.
c. Predict the U.S. population in the year 2020 using this trendline. If the current population
sits at 331 million people, does this show that using a linear model was effective for
prediction 20 years in the future? If not, why do you think it failed?
174
3.8 Non-Linear Regression
title_section
In running a two variable experiment, there are times when our observations do not conform to a linear
model. When this occurs, we can also apply the same reasoning to fit non-linear models that would
act as a predictive tool for the phenomenon being observed.
To begin with letʼs look at some of the behaviours of functions that are o en useful in developing such
models:
Model Type General Equation Shape of Graph
Linear 𝑌 = 𝑎𝑋 + 𝑏
Polynomial 𝑛
𝑌 = 𝑎𝑛𝑥 + 𝑎𝑛−1𝑥
𝑛−1
+ ... + 𝑎1𝑥 +
(e.g. Quadratic)
*Avoid degrees more than

3 for this type.
Exponential 𝑌 = 𝑎 ·𝑏
𝑋
175
Logarithmic 𝑌 = 𝑎 · 𝑙𝑛(𝑋) + 𝑏
These are but four types of functional models, and as you become more aware of the options you can
adopt the best possible shape for your experiment. There are no definitive types that are better than
others. However, we can move through a decision making process to hopefully choose the best model.
This is what weʼll work through in this section.
Measuring the Strength of a Models Fit
Remaining mindful that we wish to predict values of our dependent variable (Y), we can begin to
measure how well a particular function “fits” the given set of data for purposes of prediction. Consider
the scatter plot shown below, weʼll define:
Explained Variation:
The difference between the
predicted outcome (weʼll label
this as ym) of the functional
model to the mean of the
dependent data set, 𝑦.
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑦𝑚 − 𝑦
Unexplained Variation:
The difference between the observed data point yi and the predicted outcome from the model ym.
176
𝑈𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑦𝑖 − 𝑦𝑚
Total Variation:
The difference between the observed data point yi and the mean 𝑦.
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑦𝑖 − 𝑦 = (𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛) + (𝑈𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛)

In typical situations (with non-typical being too far beyond the scope of this course), we define the
Coefficient of Determination (R2) as follows:
2 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑈𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑖,𝑚

(
∑ 𝑦𝑚− 𝑦𝑖 )2
𝑅 =1 − 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
= 1 − 2
(
∑ 𝑦 𝑖− 𝑦
𝑖
)
Effectively, R2, gives us a score between 0 (model is completely insufficient for prediction) and 1 (all
observed outcomes are predicted by the model). Thus, the closer to 1 that the measure reads, then the
better the function models the observed data set, mathematically speaking.
Example 3.8.1 The following data examines the life expectancy of American citizens between the years
1900 - 2010.
Year 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Life Expectancy (yrs) 47.3 50.0 54.1 59.7 62.9 68.2 69.7 70.8 73.7 75.4 76.8 78.7
Create a scatter plot and determine a suitable model for prediction of oneʼs life expectancy. Use this
model to predict the life expectancy of an American citizen in the year 2020.
Solution: We begin by relabelling the variable for Year as Years a er 1900, so for example the year 1950
would translate to x = 50. This is not necessary, but is a choice for easing the numerical coefficients of
our regression models.
i. Selecting Suitable Candidate Models

We first input our data to make a scatter plot to observe the possible models that would be suitable:
At this stage we should note that

two prominent candidates are
possible models for this data;
1. Linear Model: The data seems

like it “curves” but this may be
minimal, and so a linear model
may be sufficient.
177
2. Logarithmic Model: Since the observed data seems to “flatten” as time progresses, a logarithmic
model may explain this behaviour effectively.
ii. Testing the Candidate Models

Weʼll determine both the equation (with E representing oneʼs life expectancy, t representing the time in
years a er 1900) and Coefficient of Determination for each model;
Model Type Equation R2 Sketch
Linear 𝐸 = 0. 306𝑡 + 49. 1 0.96
Logarithmic 𝐸 = 19. 1 + 12. 3𝑙𝑛(𝑡) 0.976
iii. Deciding on the Model

For both models, the coefficient of determination is quite high, so mathematically speaking there is
little difference here. So moving beyond the mathematical fit, we can see that the logarithmic model
seems to be following the trends of the observed data a little more reasonably than its linear cousin; as
the data does seem to be “flattening” and a linear model wonʼt account for this.
Thus, weʼll choose the logarithmic model for our predictions.
iv. Predicting
178
We substitute the value t = 120 into our model to predict for the year 2020:
𝐸 = 19. 1 + 12. 3𝑙𝑛(120) = 78. 0
Based on our chosen model, the life expectancy of an American citizen in the year 2020 is approximately
85.4. Note that the linear model would have predicted 116.4 which would be absurd!
⬛
Guidelines for Selecting Suitable Functional Models
The previous example can guide us along the process for determining a suitable model for a given set of
data and its associated scatter plot.
STEP 1:
Generate the scatter plot and select 2 - 3 candidate models based on the functional behaviour/shape.
STEP 2:
Determine the coefficient of determination (R2) for each of the candidate models.
STEP 3:
Decide on the best model according to;
I. Higher R2 value,
II. If the R2 values are comparable (within 0.1 of each other), choose based on which model would
predict beyond the range of the observed data better,
III. If the above considerations are comparable, then choose simplicity over complexity.
179
Practice 3.8
Technique
1. Sales of a video game released in the year 2000 took off at first, but then steadily slowed as time
moved on. The table below shows the number of games sold, in thousands, from the years
2000–2010.
Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Number Sold (000s) 142 149 154 155 159 161 163 164 164 166 167
a. Sketch a scatter plot of this data and determine at least two candidate models.
b. Decide, with justification, which model is most appropriate.
c. Use your model to predict how many sales of this game would occur in 2020.
2. In 2007, a university study was published investigating the crash risk of alcohol impaired
driving. Data from 2,871 crashes were used to measure the association of a personʼs blood
alcohol level (BAC) with the risk of being in an accident. The table below shows results from the
study. The relative risk is a measure of how many times more likely a person is to crash. So, for
example, a person with a BAC of 0.09 is 3.54 times as likely to crash as a person who has not
been drinking alcohol.
BAC 0 0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19 0.21
Relative Risk 1 1.03 1.06 1.38 2.09 3.54 6.41 12.6 22.1 39.1 65.3 99.8
c. Use your model to predict the Relative Risk of an accident if one reaches the legal blood
alcohol limit in Ontario of 0.08.
3. The following set of data compares the Start Up Costs (in thousands of dollars) to the Franchise
Fee (in thousands of dollars).
Data Set: Chapter 3 Practice: Pizza Franchise
c. Use your model to predict the Start Up Cost when asked to pay a Franchise Fee of $1.5
million dollars.
180
181
3.9 Continuous Random Variables
title_section
Now that weʼve completed our brief introduction to the wonderful world of statistics we can begin
bridging the gap between our study of probability and statistics. The remainder of the book will allow
us to delve into some of the primary techniques employed in real world statistical analysis. The hope is
that youʼll begin understanding some of the subtleties in reporting that is shown in the media that
largely goes unnoticed by us.
Uniform Distributions: A Working Example
Consider a rod measuring 100 cm in length. A single cut is made on the rod at a random point. Let C be
the random variable which represents the location of the cut.
Pause for Reflection...in case youʼve forgotten!

1. A random variable denotes the set of all possible outcomes of a randomized process.
2. A probability distribution is the set of probabilities associated with each outcome of a random
variable.
SCENARIO A: Determine the probability that C = 40 cm.
Solution:
Reflecting back onto our studies concerning probability distributions you may have noticed that we are
wishing to determine P (C = 40). The clear problem with this question however is in the assumption
that, “the cutting device has a width of one point." Thus if we were to make our cut at the 40.1 cm mark,
40.001 cm mark, or even the 40.000001 cm mark we would not be achieving the desired result. So, in
effect, we need absolute precision with this cut! So using our current understanding of probability
theory we get,
1
𝑃(𝐶 = 40) = ∞
→0
We should make some remarks to this intriguing result. For one, we cannot really divide by ∞, this is
just plainly mathematically incorrect as infinity is not a number; rather it is a concept representing
182
unboundedness. However there are an infinite (and uncountable) number of outcomes to select from.
Thus, weʼll adopt the notation for conceptual understanding at this time.
Secondly, the right arrow “→” notation signifies the approaching of the result 0. There is, of course, an
infinitesimally small chance of actually obtaining the desired cut, but this result is as close to
impossibility as one could imagine.
SCENARIO B: Determine the probability that the cut occurs somewhere in between the 39.5 to 40.5 cm
marks.
Solution:
We are still encumbered with our infinitely large outcome space yet in this instance you may recognize
that we can extract a proportion of desired results to the total possibilities. Consequently, we can
determine a measurable probability;
1 𝑐𝑚
𝑃(39. 5 ≤ 𝐶 ≤ 40. 5) = 100 𝑐𝑚
= 0. 01
So here is our conundrum:

If each specific outcome yields a probability that is effectively zero, how do we then get a non-zero
probability for a range of results?
To address this question, weʼll have to step sideways for a moment and define (review) some concepts.
Discrete v. Continuous Random Variables
The random variable that measured the precise location of a cut yielded what was described as an
infinite outcome space. We saw with the geometric distribution that a random variable could take on an
infinite set of possibilities (e.g. waiting for a certain outcome to occur from a set of repeated and
independent trials); so our random cut variable C having an infinitely large set of outcomes should be of
concern. The difference here is that we could “count” the outcomes of a geometric random variable for
example the set {0, 1, 2, 3, ...} is a countable set (meaning there is a process that we can
determine that could identify every possible outcome). If you attempted to create a method to list all of
the outcomes of the random variable C (which includes all possible decimals to the most extreme
precision that lie between 0 and 100), you would realize that this set is uncountable.
Definition 3.9.1
I. Random variables that have a countable outcome set are called discrete random variables.
II. Random variables that have an uncountable outcome set are called continuous random
variables.
A possibly more intuitive way of thinking about the distinction between discrete and continuous
random variables is to think of discrete random variables as having isolated outcomes with gaps in
between, while continuous random variables form a continuum of outcomes with no measurable gaps.
183
In practice, continuous random variables usually consist of a measured quantity such as time, distance,
volume, mass, and so on. Of course as humans we are always limited by the accuracy of the device we
are measuring with, thus it wouldnʼt be incorrect to presume that we can never achieve a truly
continuous random variable. However, for the sake of theoretical development weʼll consider such
random variables as continuous in nature and thereby apply any developed techniques accordingly.
Probability Properties of Continuous Random Variables
If our random variable X turns out to be continuous in nature (or at least theoretically continuous), then
we will always face the issue of dealing with an uncountably infinite outcome space and so we have
three critical properties to be mindful of:
1. The probability of a specific outcome always will approach zero.
𝑃(𝑋 = 𝑥) → 0
For ease of calculations later, weʼll just state that;
𝑃(𝑋 = 𝑥) = 0
2. The probability of all outcomes taken together must equal 1.
3. Non-zero probabilities have to be taken from “ranges of outcomes”, not specifics.
Continuous Probability Distributions and Probability Density Functions
You may have noticed that our example could be thought of as uniformly distributed as all of the
outcomes were equally likely. But how do we deal with a continuous random variable that has
outcomes that are more likely than others? Letʼs examine our first example once again.
184
Example 3.9.1 Consider a rod measuring 100 cm in
length. A single cut is made on the rod at a random
point. Let C be the random variable which
represents the location of the cut. Determine the
probability distribution of the random variable.
Solution:
We know that the probability of any specific outcome P (C = c) is equal as the location of the cut is
chosen at random. Yet we also know that each outcome yields a probability which approaches zero.
Thus, if we were to try to depict this distribution graphically we would get a picture as shown below:
Well this seems pretty useless, though

technically the correct graph. Also, itʼs
unclear at this point how all of the possible
outcomes of C will sum up to 1 (satisfying
out second condition). Lastly, this graph
doesnʼt show how we extract non-zero
probabilities from ranges of outcomes. We
have to do better, and we will!
To circumvent this issue, letʼs start by recognizing that the graph should show that all of the outcomes
are equally likely, thus constant, but we wonʼt make our dependent variable the actual probabilities.
Instead, weʼll say our function represents the probability density which will refer to the relative
likelihood of a specific outcome rather than the actual likelihood. Such a graph would look like the one
shown below.
Now we come to the real magic! In

order to satisfy the results
established in the previous section
concerning random variables weʼll
use area to our advantage.
Specifically, weʼll use the area
below our probability density
function to determine
185
probabilities. Letʼs see if we can satisfy the three conditions set out at the start of this section
concerning continuous random variables.
1. The probability of each specific outcome must be zero.
If we observe the example of P(C = 40), then we get the scenario below:
Since the area below the graph for

the outcome of c = 40 has no area,
then we can state that;
𝑃(𝐶 = 40) = 0.
In fact, if we took any value x = c,

then the area below this graph
would be zero, thus our first
condition is satisfied.
2. The probability of all outcomes taken together is equal to 1:
This one is satisfied by the way that our density function was defined as the area below the
graph would equal;
𝐴𝑟𝑒𝑎 𝑏𝑒𝑙𝑜𝑤 𝑓(𝑥) = 100 · 0. 01 = 1
3. Non-zero probabilities are found by taking ranges of outcomes:
Letʼs examine the probability that our cut falls between the 30 cm mark and the 50 cm mark.
1
Intuitively we know that this result should be 5
. Viewing the area below our density function
this turns out to be true as well.
186
𝑃(30 ≤ 𝑐 ≤ 50) = 20 · 0. 01 = 0. 2
Thus, our density function has managed to satisfy the requirements to accommodate for the
continuous nature of this random variable.
It would appear that we have the makings of a decent approach to work with continuous random
variables in general. In fact, even if the outcomes of the random variable are not uniformly distributed
we can still employ the idea of a density function and the areas below these functions to extract
probabilities. So it would seem there is no longer any need to delay, letʼs define this special class of
functions.
Definition 3.9.2 A probability density function is a function whose range ( “y"-values) measures the
relative likelihood of an outcome from a continuous random variable X. Such functions are
characterized by two other key properties:
I. 𝑓(𝑥) > 0 for all outcomes 𝑥 ∈ 𝑋.
II. The total area below the graph of 𝑓(𝑥)is equal to one.
The beauty of these types of functions is that they allow us to determine probabilities in the exact same
way that it worked out with our working example of cutting a rod at a random location. Even better, we
can work with situations that are not evenly distributed. Our density function can have any shape
provided it follows the two conditions defined above.
187
For the most part, weʼre going to hit a technique wall, in that obtaining areas below a function is a topic
examined in first year Calculus. This is also the reason why most non-introductory courses in
probability that move beyond the topics covered in this course will not be encountered until your
second year of study. However, we can still work with some of the simpler distributions. The uniform
distribution, where all outcomes are equally likely, has already been discussed at length so letʼs
examine a different one to conclude this section.
Exponential Distributions
Exponential functions very o en model real life situations well, for example, population growth/decay
models are o en represented with such functions. Another circumstance involves “waiting times". The
following situation is an example of such a model.
Example 3.9.2 The Woodlands Transit Authority (WTA) commissioned a study concerning how long
passengers were waiting a er transferring routes on major intersections throughout the city with the
aim of improving the quality of bus scheduling for better ridership experience. Data was collected from
1000 randomly selected passengers at the 20 most frequented intersections during business hours only
for one week. If a patron who exited one route was able to directly enter the joining route they were
determined to have waited 0 minutes. Otherwise the timer was set to count immediately commencing
their arrival at the joining stop until they entered the front door of the bus they were waiting for.
The results of the study indicated an average wait time at transferring stops to be 8 minutes. The actual
wait times, W , were modelled with an exponential function as shown to the le below:
188
This function is, in fact, a probability
density function and is given by the
relationship:
−𝑤/8
𝑒
𝑓(𝑤) = 8
,
where 𝑤 > 0, represents the waiting

time in minutes and e = 2.718 is Eulerʼs
constant.
(Youʼll have to trust us that the given
function indeed satisfies the
requirement of the total area below
the graph equalling 1.)
Determine the following:
a. the probability that a randomly selected passenger will have to wait less than 10 minutes at
their transfer stop.
b. the probability that a randomly selected passenger will have to wait longer than 20 minutes.
c. the probability that a randomly selected passenger will wait between 6 to 8 minutes.
Solution:
As we have been given a probability density function to model the waiting times of passengers, we then
can work to determine these probabilities by calculating the areas below the density function.
However, unlike our “working example" before, the values of f(w) are not constant and so we will not be
able to just take areas of rectangles formed. Instead, weʼll have to use Calculus to figure out these
areas. As mentioned before, this is where we hit our mathematical wall so to speak in this course but it
does turn out that in using the technique of integration this function is relatively easy to work with.
Using these methods we can determine the areas below our density function up to a certain wait time
with the following relationship,
−𝑤/8
𝑃(𝑊 ≤ 𝑤) = 1 − 𝑒 ; 𝑤 > 0
This type of function is referred to as a Cumulative Distribution Function as it measures the likelihood
up to a certain outcome. Using this formula we can now work out the desired results.
a. (Direct)
Since weʼre concerned with the probability of waiting less than 10 minutes, we can use the
cumulative distribution function directly;
−10/8
𝑃(𝑊 ≤ 10) = 1 − 𝑒
−1.25
= 1 −𝑒
= 0. 7135
Thus, we can conclude that about
189
71% of passengers will wait less than 10 minutes at their transfer stops.
(Note: You may have noticed that the problem desired a wait time of “less than 10 minutes", yet
we included the outcome of 10 minutes in our calculation. If one recalls the property that any
specific outcome of a continuous random variable yields a probability of zero we can effectively
conclude that 𝑃(𝑊 ≤ 𝑤) = 𝑃(𝑊 < 𝑤). If we were working with a discrete random variable
such an observation would be invalid.)
b. (Indirect)
Our cumulative distribution function measures the probability up to a certain outcome, thus
weʼll be able to determine the probability of the “undesirable" outcomes. This means that the
indirect approach can be invoked as
shown.
𝑃(𝑊 > 20) = 1 − 𝑃(𝑊 ≤ 20)

−20/8
= 1 − 1 −𝑒 [ ]
−2.5
=𝑒
= 0. 0821
Thus, about 8% of patrons will wait longer than 20 minutes when they transfer routes.
c. Here we are again encumbered by the limitation that our cumulative distribution function only
outputs probabilities up to a certain outcome. Thus, weʼll circumvent this situation by
considering the difference of the probability up to 8 minutes and take away the area up to six
190
minutes as shown on the graph below. This will leave us with the desired region.
𝑃(6 ≤ 𝑊 ≤ 8) = 𝑃(𝑊 ≤ 8) − 𝑃(𝑊 ≤ 6)

−8/8
[
= 1 −𝑒 ] − [1 − 𝑒−6/8]
−0.75 1
=𝑒 −𝑒
= 0. 3370
Thus, about 34% of patrons will wait between 6 to 8 minutes for their transfer route.
⬛
In general exponentially distributed random variables can be modelled with the density function,
−𝑥/µ
𝑒
𝑓(𝑥) = µ
; where x > 0, and µ represents the population
mean.
and associated cumulative distribution function;

−𝑥/µ
𝑃(𝑋 ≤ 𝑥) = 1 − 𝑒
There are, of course, a host of other continuous distributions that are available to us. However in the
next section we will work with one of the most important distributions used in statistical practice; The
Normal Distribution.
191
Practice 3.9
Technique
1. The width of a standard bowling lane is 60 inches (including the gutters). To have a decent
chance at throwing a “strike" (knocking down all of the pins) the bowler must hit the centre pin
approximately head on. Thus, the thrower has about a 1.5 inch range to work with in order to
optimize their chances of throwing a strike. Maaz is a toddler who throws the ball without
aiming. The ball has an equal likelihood of reaching the end of the lane at any point along its
width.
a. Sketch the probability density function for this situation.
b. What is the probability that Maazʼs throw reaches somewhere between the 29.25 to
30.75 inch zone, thereby allowing for a decent chance of throwing a strike?
c. What is the probability that Maazʼs throw reaches the gutter? The gutters comprise 6
inches on the most extreme ends of the lane and will yield no pins knocked down.
2. Upon arrival at a check-out line in a supermarket a study determined that the wait time for a
customer to be served was exponentially distributed with a mean waiting time of 7 minutes.
Determine the probability that:
a. a customer will have to wait longer than 15 minutes to be served.
b. a customer will wait between 5 - 10 minutes.
Studies
3. Given that a random variable X is exponentially distributed with mean µ, determine the
1
outcome x such that 𝑃(𝑋 ≤ 𝑥) = 2
. Does this result surprise you? Why?
Repertoire
NOTE: These problems will require an understanding of Calculus techniques.
4. Prove that the density function for an exponentially distributed random variable X, that the
function;
−𝑥/µ
𝑒
𝑓(𝑥) = µ
; where 𝑥 ≥ 0, and µ represents the population mean,
satisfies the requirements of a density function.
5. Given that X is an exponentially distributed random variable with population mean µ. Derive
the cumulative distribution function given its density function defined above.
192
3.10 Normal Distributions
title_section
Now that weʼve built up the necessary framework to work with continuous random variables, weʼll set
on to work on a distribution that arises frequently in life. When a set of data is collected and a mean is
calculated with its associated standard deviation, the distribution that one o en imagines is the set of
data generally clustered about the mean and spreading out evenly on either side of the mean.
Graphically such a distribution would look something like the diagram below:
This type of distribution is symmetrical in nature and has a characteristic “bell shape" to it. If our data
set exhibits such a behaviour we can impose a density function which models this type of distribution.
This type of distribution is referred to as the Normal Distribution and it is characterized by the
probability density function,
2
(𝑥−µ)
− 2
2σ
1
𝑓(𝑥) = 𝑒 ; where µ, σ represent the mean and standard
σ 2π
deviation.
Once again, weʼll have to ask that you trust that this function satisfies the required properties of a
probability density function, namely given any mean and standard deviation , the area below this
function will equal 1.
The key will be to understand that the most likely outcome relative to all other outcomes would be
the mean µ; this is found at the top point of the bell curve.
193
Characteristics of Normal Distributions
When a random variable is normally distributed we o en write this symbolically as 𝑋 ∼ 𝑁(µ, σ) due to
its reliance on the mean and standard deviation. Thus given a random variable 𝑋 ∼ 𝑁(µ, σ) there are
some critical properties which characterize this very special distribution.
µ − 2σ µ − σ µ µ + σ µ + 2σ
1. All Normally Distributed random variables are symmetrical about the mean µ.
2. The range of outcomes falling within one standard deviation from the mean, µ, yield a
probability of 68%.
3. The range of outcomes falling within two standard deviations from the mean, µ, yield a
probability of 95%.
What is most remarkable, is that these properties hold no matter what the mean and standard deviation
of the population are and so we can work relative to these measures when determining probabilities.
This is a big factor as to why the standard deviation is the preferred measure of spread when using the
mean as the measure of central tendency.
Determining Probabilities with Normal Distributions
Example 3.10.1 Studies have shown that the age in which infants begin crawling are normally
distributed with a mean of 32.7 weeks and standard deviation of 3.1 weeks. Determine:
a. the proportion of children who begin crawling earlier than 28 weeks.
b. the proportion of children who begin crawling a er 40 weeks of age.
c. a range of ages, centred about the mean, in which 90% of children begin crawling.
194
Solution: Because weʼre immediately notified that the crawling age is normal, we can immediately
work to calculate areas below the normal density function. Unfortunately, the cumulative distribution
function is absolutely non-user friendly! So we have to resort to one of two methods, which weʼll
discuss separately.
Approach 1: (Using the Z-score table)

You may have noticed a er the B.O.B. section of the book there is a 2-page Z-Score Table. This table
gives us the cumulative distribution function for the z-scores of a Normal Distribution. Thus, if we
convert the measurements weʼre interested in, into z-scores, we can use this table to determine areas
below the normal curve up to that measurement. Hereʼs how:
a. We want the proportion of children who crawl at an age earlier than 28 weeks. Thus, we require
the area below the density curve up to C = 28 (shown to the le below):
𝑃(𝐶 < 28)
(
= 𝑃 𝑍28 <
28 − 32.7
3.1 ) Here, weʼre converting the measurement into a z-score
(
= 𝑃 𝑍28 < − 1. 52 )
= 0. 0643
Thus, approximately 6.4% of children will begin crawling before 28 weeks of age.
b. Working in a similar vein, we need to determine the area below the density curve that lies
above 40 weeks of age (shown to the right above). Thus, weʼll convert 40 weeks into a z-score
and use the chart to locate our probability.
𝑃(𝐶 > 40)
(
= 𝑃 𝑍40 >
40−32.7
3.1 )
(
= 𝑃 𝑍40 > 2. 35 )
(
= 1 − 𝑃 𝑍40 ≤ 2. 35 ) The z-score chart only gives areas up to a specified z-score.
195
= 1 − 0. 9906
= 0. 0094
Therefore, only about 1% of children will take longer than 40 weeks to begin crawling.
c. Finally, to obtain a range of ages whereby 90% of children will begin to crawl, we want to
determine which values will yield an area of 90% under the density curve shown below:
To determine the lower and upper bounds of this range,

we recognize that:
the lower bound will capture 5% of the area, and
the upper bound will capture 95% of the area below the
density curve.
So using the z-score chart, we “look-up” which z-score

will yield a probabilities of 5% and 95% respectively.
Lower-Bound 𝑧0.05 = − 1. 645 Upper-Bound 𝑧0.95 = 1. 645
Notice that the symmetry of the Normal Distribution will allow us to conclude the upper-bound
without having to “look it up”.
Now, to finally get our range we solve for which c-value will yield the z-scores in question:
Lower-Bound Upper-Bound
𝑐 − 32.7 𝑐 − 32.7
𝑧0.05 = 3.1
𝑧0.95 = 3.1
𝑐 − 32.7 𝑐 − 32.7
− 1. 645 = 3.1
1. 645 = 3.1
𝑐 = (− 1. 645)(3. 1) + 32. 7 𝑐 = (1. 645)(3. 1) + 32. 7
𝑐 = 27. 6 𝑐 = 37. 8
Therefore, 90% of children will typically begin crawling 28 to 38 weeks a er birth.

OR
90% of children begin crawling 32. 7 ± 5. 1weeks a er birth.
⬛
196
Approach 2: (Using Spreadsheet So ware)
The Normal Distribution along with its density function and cumulative distribution function
are common enough that most spreadsheet so ware employs them as built in functions. In
particular two such functions will prove useful:
1. The NORMDIST function whose google Sheets parameters are
=NORMDIST(outcome, mean, standard deviation, cumulative dist?)
If we wish to activate the cumulative distribution (which we o en do), we enter “true”
for this parameter.
2. The NORMINV function whose Google Sheets parameters are

=NORMINV(desired probability, mean, standard deviation)
Using these functions we can quickly calculate the desired outcomes.

a. 𝑃(𝐶 < 28) = 0. 0647 inputting “=NORMDIST(28, 32.7, 3.1, true)”
b. 𝑃(𝐶 > 40) = 1 − 0. 9907 = 0. 0093

Remember that the cumulative distribution function yields values up to the given outcome.
c. Lower-Bound = 27.6 inputting “ =NORMINV(0.05, 32.7, 3.1)”

Upper-Bound = 37.8 inputting “ =NORMINV(0.95, 32.7, 3.1)”
197
Practice 3.10
Technique
1. Determine the following probabilities given the random variables shown using the Z-Scores
Chart. This link is a video explaining how one uses the chart: Using Z-Score Charts
a. 𝐴 ∼ 𝑁(23, 7); 𝑃(𝐴 ≤ 29)
b. 𝐵 ∼ 𝑁(0, 2. 3); 𝑃(𝐵 > 3. 4)
c. 𝐶 ∼ 𝑁(4091, 1200); 𝑃(1000 ≤ 𝐶 ≤ 5225)
d. 𝐷 ∼ 𝑁(− 31. 5, 4. 4); Determine a range of outcomes, centred about the mean, which
contains 90% of the population.
2. A large study examining driving habits on 400 series highways show that driving speeds during
non-congested times are normally distributed with a mean of 107.4 km/h and standard
deviation of 5.7 km/h.
a. The legal speed limit on 400 series highways is 100 km/h. Determine the proportion of
drivers who technically exceed this limit.
b. Speeding tickets are usually only given to drivers who exceed 120 km/h or drive slower
than 80 km/h. Determine the proportion of drivers who drive in these speed ranges.
c. Determine a range that 80% of drivers speeds fall under, centered about the mean.
3. A prominent confectionary company produces 500 gram bags of potato chips. The factory
process which produces these bags yields bags with masses that are normally distributed with
a mean of 500 g and a standard deviation of 30 grams. The company has guaranteed that if a
bag is found to weigh less than 400 grams they will provide the customer with a yearʼs supply of
chips worth an estimated value of $250. Previous trends have indicated that the company sells
approximately 450 000 bags of chips every year.
a. Assuming that a “light" bag would be detected by a customer any time that this occurs,
determine the expected cost to the company in fulfilling their warranty.
b. Market research has shown that a “light" bag would in fact be detected approximately
10% of the time. With this information re-evaluate the expected warranty costs to the
company.
4. Many academic institutions will employ “bell curving" techniques to ensure that their students
are evaluated based on their performance relative to their class. Assuming that a class's grades
are approximately normally distributed, marks will be adjusted to reflect grades that fall under
a normal distribution with a mean of 67% and a standard deviation of 5%. Suppose a class
writes a test whose grades are normally distributed with a mean of 40% and a standard
deviation of 15%. Adjust (or “bell-curve" the following grades):
a. A student whose mark was 30%.
b. A student who scored 86%.
198
Studies
5. Given a normally distributed random variable 𝑋 ∼ 𝑁(µ, σ), determine the range of outcomes
that yield a (1 − )probability centred about the mean. Note: The symbolαis o en referred
α
2
to as the level of significance, this denotes the proportion that lies beyond the range being
considered. For example, if our range is 90%, then α = 10%.
199
3.11 Confidence Intervals: Estimating Population
Means
title_section
When collecting data for a 1-variable study there is a sense of uncertainty as to whether or not the
sample mean (𝑥) thatʼs been acquired is actually representative of the population mean . Of course, in
designing an experiment with unbiased sampling methods and large enough sample sizes one would
hope that our estimate would be reasonably close to the population mean (µ) .
It turns out that an astonishing result allows us to actually state, with a certain degree of confidence,
how sure we are about where the population mean lies based on our experimental results. This result is
known as the Central Limit Theorem and it is perhaps the most critical property within the discipline
of statistical reasoning.
Repeated Sampling and The Central Limit Theorem
Suppose we have a population with some type of distribution (which may take on any shape), and we
take a sample of size 10 from this population obtaining a sample mean, 𝑥1. Then we take a second
sample, also of size 10, and obtain another sample mean, 𝑥2. We continue to repeat this process
obtaining 𝑥3, 𝑥4, and so on running the experiment (with different samples of the same size) an
indefinite amount of times. This process would be termed as repeatedly sampling from the same
population. We then can take the collection of all of the sample means, 𝑥𝑖 and observe the mean,
standard deviation, and overall distribution of the “sample mean” data set.
Clear as mud? Itʼs likely youʼll have to read this one over several times, but here is a link to a
java-application (very dated, but effective) to let you play around with the process of repeated
sampling.
Repeated Sampling Distribution App
Moreover letʼs label the mean of this new data set µ , and its standard deviation σ . Then the following
𝑥 𝑥
incredible result holds:
Theorem 3.11.1 (The Central Limit Theorem) The distribution of sample means (𝑥𝑖) taken from a
repeated sample each of size n from any type of initial population will be normally distributed with
Mean: µ𝑥 = µ, and
σ
Standard Deviation: σ𝑥 = ,
𝑛
where µ, σare the population mean and standard deviation respectively.
200
Are you confused with all the verbiage? I would be upon first encountering this. Itʼs worth reading over
again if it hasnʼt sunk in. We unfortunately canʼt go over a proof of this result. This is a “limit” theorem
which implies analytical methods encountered in Calculus are required to set the argument. Weʼll have
to rely instead on viewing this in practice like in the application linked previously.
Confidence Intervals: Estimating thePopulation Mean from a Single Sample
Having the Central Limit Theorem in our back pocket so to speak we now can turn to estimating a
population mean (µ), based on the sample mean (𝑥) taken from the results of a single experiment.
Due to the Central Limit Theorem, we know, for example, that if we were to repeatedly sample from
any given population, then 95% of the sample means would fall within 2σ of the population mean (
𝑥
µ). Another way to phrase this, is that 95% of the sample means should fall somewhere inside the
interval:
(µ − 2σ𝑥 , µ + 2σ𝑥)
Conversely, if we were to take repeated samples, each with its own sample mean (𝑥𝑖) and then construct
an interval about each mean of width 2σ𝑥, we would know that 95% of these intervals would contain
the population mean (µ ).
For example, in the following diagram, we see that 19 of the 20 intervals, centred about the sample
mean (𝑥𝑖) encompass the actual population mean. Thus, given our lone sample of size n, we could state
that we are, “95% confident that the population mean lies somewhere within this interval."
201
Hopefully that last bit made some semblance of sense. These are not trivial concepts, however they
yield better results than just simply quoting a mean and standard deviation as we can now estimate,
with a certain level of confidence, where the actual population mean lies.
Example 3.11.1 Raluca wishes to study the volume of work put into a statistics course by her
classmates. To do this, the number of pages selected from a sample of 30 classmates were recorded.
They obtained a sample mean of 231.6 pages with a sample standard deviation of 56.0 pages.
Determine a 95% confidence interval to estimate the population mean (µ ) for all students who take
this course.
Solution:
It should be mentioned that taking a convenience sample of classmates would, in all likelihood, yield
biased results. However, ignoring the limitations of the experiment, we can employ the Central Limit
Theorem to estimate the actual population mean (µ ).
Based on the discussion above, we know that there is a 95% likelihood that the population mean will sit
within the interval,
202
(𝑥 − 2σ , 𝑥 + 2σ ),
𝑥 𝑥
where 𝑥is the sample mean, and σ𝑥is the sampling standard deviation.
Weʼre given that

𝑠 56
𝑥 = 231. 6 and σ𝑥 = = = 10. 2
𝑛 30
Moving forward, we get that our 95% confidence interval is:

(231. 6 − 2 · 10. 2, 231. 6 + 2 · 10. 2) = (211. 2, 252. 0)
Thus, we estimate, with 95% confidence, the number of pages a student in the statistics course would
have a mean which falls between 211 pages to 252 pages.
A more common way of phrasing this is:

Students, on average, will have 231 pages in their notebooks; ± 20. 4, 19 times in 20.
Some Remarks on the Solution:
1. To construct a 95% confidence interval, we used the sample standard deviation (s) and not the
population standard deviation (σ). This does not truly reflect the criteria set in the Central
Limit Theorem, however as the experimenter we usually have little else to go on. Thus, we
generally will use the sample standard deviation to estimate , and hope that our sampling and
experimental design is strong enough to provide a reasonable result in this matter.
2. The final conclusion did not really exhibit the interval range. Rather, it is common practice to
state your sample mean (𝑥) as the main result to focus on. The actual confidence interval is
displayed in the anecdote following the mean, namely the value along with a reference to the
level of con dence where “19 times in 20" tells us we are 95% sure that this range is accurate.
Example 3.11.2 A study of 25 kickoffs in the CFL showed that the mean distance kicked was 73.7 yards
with a standard deviation of 8.4 yards. Determine a 90% confidence interval which estimates the mean
distance CFL kickers can achieve.
Solution:
Since we need a range that yields 90% we once again “reverse look-up" the appropriate z-score that
yields a 5% probability as that will be the area covered in each tail of the normal distribution. In doing
this we get,
𝑧α/2 = 1. 645 Look over Example 3.10.1 to review why this is so.
Thus, our 90% confidence interval will be of the form:

203
8.4
𝑥 ± 1. 645σ𝑥 where 𝑥 = 73. 7 and σ𝑥 =
25
= 1. 68
Giving us,
73. 7 ± 2. 8
Therefore, CFL kickoffs measure an average distance of 73.7 yards; ± 2. 8yards, 9 times in 10.
⬛
204
Practice 3.11
Technique
1. Determine the z-scores that will yield confidence intervals of 90%, 95%, and 99%. You can use
the z-score table or the “NORMINV” function on a spreadsheet to find these.
NOTE: the z-score for 95% is not exactly 2 as was approximated earlier in this section. Try to
keep these values in mind as there are the most frequently used levels of confidence.
2. Given a sample of 100 data with a sample mean of 34 and a standard deviation of 16, determine
a 99% confidence interval which estimates the population mean.
3. Stephanie sampled 30 students throughout The Woodlands inquiring about the amount of
money students most recently spent on headphones. She obtained a mean of $47.52 with a
standard deviation of $16.15. Determine a 95% confidence interval estimating the amount
spent on headphones at our school. (Note: Use the z-score you obtained in problem 1.)
4. In investigating the amount of push-ups a typical Woodlander could perform Mehul had
sampled 75 students asking them to perform as many proper push-ups as possible. He found a
mean of 5.4 push-ups with a standard deviation of 4.4. Determine a 90% confidence interval
which estimates the number of push-ups a typical Woodlands student can perform.
Studies
5. In comparing two sets of data we o en refer to the sets as being in a, “statistical dead-heat," if
the confidence intervals estimating the population means have an overlap. If the intervals are
separate we can then conclude a, “statistically significant," difference in the results.
A study examining the length of time it takes individuals to use washroom facilities at Square
One mall compared the times of 40 women and 40 men. The time was measured in seconds
from the moment the respondent entered the washroom until they exited (no “in house"
observations were made to respect privacy).
The results for the women showed a mean time of 297 seconds with a standard deviation of 93
seconds, while the men had a mean of 210 seconds with a standard deviation of 72 seconds.
Compare these results using the above criteria to a confidence level of 95%.
6. It has been shown that teenagers in Canada are “on-screen" (either watching TV, using a
computer, gaming, or using a mobile device) for 6.7 hours per day ( ±2.1 hours, 99 times in 100).
Assuming a standard deviation of 1.3 hours per day was determined, how large was the sample
size for this study? Does this seem to be a reasonable size to make such a conclusion?
205
7. Given a confidence interval statement with the stated mean (𝑥), plus/minus range, and
confidence level. Determine a formula which determines the sample size n if weʼre also aware
of the sample standard deviation (s).
Repertoire
8. Research the proof of the Central Limit Theorem. You may require some further background in
both Calculus and Probability to understand it fully, but you may get the gist of the argument of
this remarkable result.
206
3.12 Approximating Discrete Distributions
title_section
In this section weʼll explore another useful application of the Normal Distribution (and continuous
distributions in general). Letʼs start with a simple example.
Example 3.12.1 Sarah in her playful way responded to a 400 question multiple-choice test, each having
5 possible responses, entirely by guessing. Determine the probability that she correctly responded to at
least 100 of these questions.
Solution:
If this problem seems familiar, then it would appear that youʼve learned something during your time in
this course! This is because weʼve dealt with this type of question before. Technically speaking we can
solve this using the Binomial Distribution, where n = 400 and p = 0.2.
Let C be the random variable which represents the number of correct solutions. Then we have,
𝑃(𝐶 ≥ 100) = 𝑃(𝐶 = 100) + 𝑃(𝐶 = 101) +... + 𝑃(𝐶 = 398) + 𝑃(𝐶 = 399) + 𝑃(𝐶 =
Well this just seems plain mean and redundant, so in our ever enduring pursuit of speeding things up
weʼll attempt to observe some useful patterns. Graphing the entire probability distribution of C we get
the following shape:
Clearly this has a high resemblance to a Normal Distribution. Thus, we can use the normal distribution
to significantly cut down on our calculation time and hopefully achieve a decent approximation (close
to the actual) result. Of course, to use the normal distribution we require two key pieces of information,
namely, the mean and standard deviation.
207
The mean turns out to be the expected value of the binomial distribution, and recalling that the
expected value of a binomial random variable is E(C) = np we get;
µ = 𝐸(𝐶) = (400)(0. 2) = 80
The variance is given to be the “expected value of the squares of the deviations”, namely
2 2
[
σ = 𝑉(𝐶) = 𝐸 (𝐶 − 𝐸(𝐶)) ]
This may seem confusing, but really itʼs just the same logic that we used to measure spreads of any data
set (i.e. weʼre just taking the mean of the deviations squared). Anyway, this yields;
2
σ = 𝑛𝑝𝑞
⇒ σ = 𝑛𝑝𝑞
Using these parameters, we get that;

σ = 400(0. 2)(0. 8) = 8
Comparing the two distributions we can see how closely the Normal density function models the
Binomial Distribution:
Thus weʼre safe in approximating our random variable as 𝐶 ∼ 𝑁(80, 8)giving rise to:
𝑃(𝐶 ≥ 100) (Binomial)
≈ 𝑃(𝐶 ≥ 99. 5) (Approximating with Normal Distribution)*
= 𝑃(𝑍99.5 ≥ 2. 44)
= 1 − 𝑃(𝑍 < 2. 44)
= 1 − 0. 9927
208
= 0. 0073
Thus, there is only a 0.7% chance that Sarah would score higher than 100 using this method.
The Continuity Correction
In the above example you may have noticed that when approximating with the normal distribution we
changed the range being considered, namely we moved from;
𝑃(𝐶 ≥ 100) ≈ 𝑃(𝐶 ≥ 99. 5)
(Binomial) (Normal Approximation)
This was no accident. As the random variable, C, is originally a random variable that follows the
binomial distribution we know that it is “discrete" in nature. So its outcomes take on the values 100,
101, 102, ..., 399, 400. But the normal distribution is continuous in nature meaning that it accounted for
all outcomes in the interval (100, 400). Consequently, an adjustment is necessary to bridge the gap
between this discrete random variable and its continuous counterpart.
If we examine the outcome of C = 100 specifically, we know that this specific outcome would yield a
probability of P (C = 100) = 0 in the continuous case. Recalling the nature of continuous random
variables, we need to instead take a range of outcomes to ensure a non-zero probability. Thus, weʼll
equate the single outcome of “100" by the interval (99.5, 100.5). Moving to the next outcome C = 101 we
make the same adjustment using the interval (100.5, 101.5), and so on. The diagram below illustrates
this continuity correction:
209
Example 3.12.2 Two dice are rolled 250 times during an exceedingly long game of backgam-mon.
Determine the probability that doubles were rolled less than 40 times.
1
Solution: Let D represent the number of doubles rolled. (Binomial with n = 250, 𝑝 = 6
)
Since we would have to compute forty calculations to determine this probability, we have two choices:
1. Calculate the probability using spreadsheet so ware, or

2. use the Normal approximation.
Either of these approaches are valid. The spreadsheet approach would yield the exact answer, but letʼs
use the approximation:
We first require the mean and standard deviation:

µ = 𝑛𝑝 = (250) ( ) = 41. 7
1
6
(250)( )( ) = 5. 89
1 5
σ = 𝑛𝑝𝑞 = 6 6
Thus,
𝑃(𝐷 < 40) (Binomial)
≈ 𝑃(𝐷 < 39. 5) (Normal Approximation with the continuity correction)
= 𝑃(𝑍39.5 < − 0. 37)
= 0. 3594
⬛
Notice that as we desired D < 40, which does not include the outcome of 40 itself, we began our interval
at 39.5 rather than 40.5. You have to be mindful of these subtleties when using the continuity correction.
210
Practice 3.12
Technique
1. Given the ranges below for a discrete random variable, determine an appropriate interval which
corrects for continuity when using a normal approximation:
a. 𝑋 ≥ 22
b. 4500 < 𝑌 < 4550
c. 𝑍 ≤ 15
d. 𝑊 > 3
2. It has been estimated that the success rate of a “blind date" leading to a committed relationship
is about 24%. Suppose 100 pairs of men and women attended a blind date event. What is the
probability that at least 10 of the dates will result in a long term relationship?
3. A machine that produces mechanical pencils churns out approximately 3500 units per day.
About 3% of the pencils usually turn out to be defective. What is the probability that there will
be between 100 to 200 defective pencils on any given day?
4. In the game of roulette, there are 18 red sectors, 18 black sectors, and 2 green sectors. When a
player bets on a colour the payout is valued at a 2:1 ratio. For example, if the player bets $1 on
“black" and wins, he will get his dollar back plus one more dollar, if he loses then he simply
loses the dollar. If Vincent consistently bets his last $40 on “black" for 40 spins in a row, what is
the probability that he will gain money?
Studies
5. A “rule of thumb” for the normal distribution to be an adequate approximation for the binomial
distribution is for both np and nq to be greater than five. Investigate what happens to the shape
of the binomial distribution when this condition is not satisfied by sketching out graphs of
distributions on spreadsheet so ware. What do you notice about the shape of the distribution
when these conditions are not met?
Repertoire
6. Attempt to develop a procedure to approximate a Hyper-Geometric distribution with the
Normal distribution.
7. Attempt to develop a procedure to approximate a Geometric distribution with an Exponential

distribution.
211
212
3.13 Confidence Intervals: Estimating Population
Proportions
title_section
A frequently occurring conclusion that one will see in the media is an estimate of the proportion of the
population exhibiting some type of characteristic. For example: 54% of voters support the prime
minister on foreign policy, 47% of marriages end up in divorce, 5% of people who take a certain drug
had an allergic reaction, etc. Studies such as this are based on a success/failure type of dichotomy.
Typically a survey or observation will occur and the experimenter will simply record whether or not the
respondent exhibits the characteristic or not. The proportion that the experimenter determines,
denoted by (𝑝), is only an estimate of the actual population proportion (p).
Recall that if we have a series of n independent events each with a chance of success (p), then the
random variable which counts the number of successes is Binomially Distributed where:
𝑥 𝑛−𝑥
𝑃(𝑋 = 𝑥) = 𝐶 𝑝 𝑞
𝑛 𝑥
, and
𝐸(𝑋) = 𝑛𝑝
In choosing a true/false (success/failure, yes/no, etc…) type of variable to analyze in a given experiment,
we can somewhat safely assume that each member of the population of size n determines their result
independently. Also, as was discussed in the previous section, we are able to approximate binomial
distributions (under certain circumstances) with the normal distribution with:
Mean: µ = 𝐸(𝑋) = 𝑛𝑝, and

Standard Deviation: σ = 𝑛𝑝𝑞
Knowing this, we can now exploit the properties of the Central Limit Theorem to once again to obtain a
confidence interval which estimates the location of the population mean (µ) and by extension, the
population proportion (p).
Letʼs play this out a bit:
Recall that if we have a normally distributed random variable and wish to determine a range of
outcomes, centred about the mean, yielding a given probability, the range is given by;
(𝑥 − 𝑧 σ) < µ < (𝑥 + 𝑧 σ)
α/2 α/2
But given that weʼve run this process with a sample size of n, we can substitute the Binomial
distribution measures for the mean and standard deviation as follows:
213
( ) (
𝑛 𝑝 − 𝑧α/2 𝑛 𝑝 𝑞 < 𝑛𝑝 < 𝑛 𝑝 + 𝑧α/2 𝑛 𝑝 𝑞 )
dividing every expression in this inequality by n we finally obtain our approximation of the population
proportion (p), namely:
( 𝑝 − 𝑧α/2
𝑝 𝑞
𝑛 ) (
< 𝑝 < 𝑝 + 𝑧α/2
𝑝 𝑞
𝑛 )
OR
( 𝑝 − 𝑧α/2
𝑝 𝑞
𝑛 ) (
< 𝑝 < 𝑝 + 𝑧α/2
𝑝 𝑞
𝑛 )
Fun stuff, eh :) Letʼs use this in an example. This is the minimal extent that youʼll need to understand in
this course.
Example 3.13.1 In a blind taste test, 25 of 30 people stated that they preferred the taste of tap water
over bottled water. Determine a 95% confidence interval which estimates the proportion of the
population who prefer tap water.
Solution:
Using the interval derived above we note that based on this sample we have;
n = 30
25 5
𝑝 = 30
= 6
5 1
𝑞 = 30
= 6
𝑧α/2 = 1. 96
Thus, we get
( 5
6
− (1. 96)
(5/6)(1/6)
30 ) < 𝑝 < ( 5
6
+ (1. 96)
(5/6)(1/6)
30 )
yielding us our estimate as,
0. 7 < 𝑝 < 0. 96
Thus, we can estimate that the proportion of people who prefer tap water over bottled water is
83%; (± 13%, 19 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 20).
⬛
A key to remember with this type of interval, is that our variable is a percentage, and so the mean and
standard deviation are also percentages. This type of variable is extremely common in many polls and
214
studies, so maintain awareness of difference between confidence intervals measuring the population
proportion (p) versus the type where we are measuring a population mean (µ). Letʼs work through
one final example.
Example 3.13.2 A randomized phone survey asked respondents whether or not they felt that “climate
change" was a real issue that must be dealt with. It turned out that 417 of the 1000 surveyed felt that
climate change was not a danger to our planet. Determine a 90% confidence interval estimating the
proportion of the population who feel that climate change is not a major concern.
Solution: This is really just a “plug and chug” question, so letʼs have it.
n = 1000
𝑝 = 0. 417
𝑞 = 0. 583
𝑧α/2 = 1. 645
Thus our interval can be determined as
𝑝𝑞
𝑝 ∈ 𝑝 ± 𝑧α/2 𝑛
,
This is simply an alternate way of expressing the confidence interval derived earlier.
(0.417)(0.583)
⇒ 𝑝 ∈ 0. 417 ± (1. 645) 1000
⇒ 𝑝 ∈ 0. 417 ± 0. 025
Thus, the proportion of people who do not feel that climate change is a major concern is estimated to
be 42%; (± 2. 5%, 9 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 10).
⬛
Experimental Strategies for Yielding Suitable Confidence Intervals
When designing a statistical experiment, it is o en necessary to to determine estimates of the

population mean (µ) or population proportion (p). When this is the case, weʼll be required to use
confidence intervals to properly address the variation inherent when obtaining a sample. Thus, in order
to ensure that these intervals have a “reasonable range”, it is important to ensure that the width of the
interval is minimized within reason.
There are two approaches which we can use to shrink the width of a confidence interval:
1. Decreasing the Confidence Level
2. Increasing the Sample Size
Doing either of these, or both in combination will assist in managing the suitability of your conclusions,
215
so youʼll have to decide what must be prioritized when faced with limited means (e.g. budget, time,
etc…).
Moreover, one should always be mindful of these parameters when viewing quoted studies in the media
(or academia) as there could be some impactful distortions of the results when one examines the:
- Chosen Variables
- Procedure
- Sampling Technique
- Sample Size
- Choice of Confidence Level
For your final thesis project, all of these considerations should be examined and justified, even if this
weakens your argument! What weʼre ultimately aiming for is the most accurate representations of the
phenomenon being studied. If our experiments are limited in scope, but yield possibly impactful
results, then we can conclude that further research should be conducted to confirm said results. Never
be afraid of this! We can, at least at this point, remain close to our ideals without risk of penalty.
216
Practice 3.13
Technique
1. A study testing the safety of a newly developed headache medicine showed that 12 of 250
participants suffered adverse side-effects. Determine a 95% confidence interval estimating the
proportion of the population who may be susceptible to experiencing side-effects.
2. A survey of graduating students at our school indicated that 74 of 90 were influenced by their
parents when choosing their post secondary program of study. Determine a 90% confidence
interval estimating the proportion of students who based their choice of program on the
influence of their parents.
3. An experiment concerning oneʼs ability to concentrate on simple problem solving questions

while placed in a room with a loud constant noise was performed on a group of 50 college level
students. The test had 25 simple arithmetically based problems to be worked through in a 2
minute time frame. The mean score of the students was 68% with a standard deviation of 7%.
Determine a 95% confidence interval estimating the population mean. (Note: Be mindful of the
type of confidence interval asked for here.)
4. Kenzie and Stephanie are both running to be president of the Woodlands Student Council. In a
poll of 75 students, 30 stated that they supported Kenzie while 40 stated their support for
Stephanie. Use 90% confidence intervals to determine if this poll suggests a statistically
significant advantage for Stephanie. (Check the studies problems in section 3.11 if you are not
aware of what statistically significant means.)
Studies
5. A news report stated that 73% of professional hockey players have experienced some
type of head injury while playing the sport. The quote also stated this estimate is
true within a range of 3%, 19 times in 20. Determine how large of a sample was
collected in this study.
6. Generalize the above problem to determine the sample size, n, of a quoted confidence interval
with a stated experimental proportion (𝑝 ), and confidence level (e.g. 90%).
217
3.14 Hypothesis Testing Revisited
title_section
p-Values
In section 2.11 we examined the idea of executing a hypothesis test to determine whether or not an
experimental result is unlikely enough to suggest a possible anomaly (perhaps in a good way or
possibly unfortunate). At the time many of the examples and problems we were working through had a
limited number of trials as it would take a substantial amount of time to calculate the relative
probabilities. However, now armed with the techniques encountered within this unit, we can set our
sites to larger experimental results and consequently test our hypotheses in a more realistic manner. To
begin with, letʼs introduce two commonly used terms.
Definition 3.14.1 The probability of an experimental result used in a hypothesis test is called the
p-value.
Definition 3.14.2 The significance level (α) of a hypothesis test is a predetermined likelihood that an
experimental level must achieve before being considered “rare enough” to prove that the experimental
trial has demonstrated significant change from the conditions declared in the null hypothesis.
Our decision regarding the null hypothesis is summarized below:
Conclusion p-value
Reject H0 p-value < α
Accept H0 p-value > α
Letʼs start with the rather serious matter of growth hormones affecting the North American population,
particularly in young females.
Example 3.14.1 In 1992, a new hormone was introduced to cattle farmers which allowed cows to
mature to a larger size and also produce more milk. Recent concerns about the effects of this hormone
in quickening young girls achieving their first menstrual cycle have been raised. To investigate this
concern, a survey of 100 women was conducted. Fi y of the women were born early enough to never
have had exposure to the hormone laced milk while growing up, while the other fi y women were born
a er 1995 ensuring a possible exposure to the hormone.
For the older group, the initial menstruation occurred at a mean age of 12.5 years with a standard
deviation of 1.4 years, while the younger group had a mean of 11.7 years with a standard deviation of
2.3 years. Test this result to a significance level of α = 5%.
Solution:
218
Our two groups (older and younger women) suggest null and alternative hypotheses of;
H0: µ = 12. 5 Since the control group (the older women) had this mean age.
Ha: µ < 12. 5 Since our experimental group demonstrated a mean which was lower than the control group.
Weʼll then assume that the null hypothesis is true, meaning that µ = 12. 5 and the standard
deviation is σ = 1. 4years respectively.
Given this assumption the Central Limit Theorem dictates that we also must recognize that our sample
of 50 respondents would yield a mean which would be part of a normal distribution of sample means,
with:
µ𝑥 = µ = 12. 5,
σ
σ𝑥 = = 0. 198
𝑛
Consequently, we can evaluate the likelihood of obtaining a sample, of size 50 (remembering that the
experimental group only had 50 participants), falling at 11.7 years or below:
p-value = 𝑃(𝐴 ≤ 11. 7); where 𝐴 ∼ 𝑁(12. 5, 0. 198)
(
= 𝑃 𝑍11.7 ≤
11.7−12.5
0.198 )
(
= 𝑃 𝑍11.7 ≤ − 4. 04 )
≃ 0
< α
As the probability of our sample occurring is virtually zero and far below our significance level of 5%, we
can then conclude that the null hypothesis is no longer valid for this population! Thus, we can conclude
that females are indeed menstruating at earlier ages.
⬛
Hypothesis Test on Proportions
Example 3.14.2 In Ontario schools, each student must complete and successfully pass a Literacy Test
as one of their secondary school requirements. The Woodlands have achieved an 87% pass rate on this
test during the past five years. During this past year new school initiatives were implemented with the
aim of improving student success on this test. It turned out that 312 of 347 students that wrote the test
under this new instructional system were successful on the test. Does this result suggest a statistically
significant improvement in the schoolʼs literacy test scores? Test this result to a significance level of 5%.
Solution:
This situation is identical to the types examined in section 2.11, whereby we can consider the number of
successful students as following a Binomial Distribution with n = 347 trials. The information given to
us is that:
H0: p = 0.87 The ʻacceptedʼ success rate before the change was implemented.
219
Ha: p > 0.87 As the current result seems to indicate a better rate of success.
Assuming H0 is true, we test our experimental result;
p-value = 𝑃(𝑋 ≥ 312) (Binomial)

≈ 𝑃(𝑋 ≥ 311. 5) (Normal Approximation with µ = 𝑛𝑝 = 301. 89 and σ = 𝑛𝑝𝑞 = 6. 26
)
(
= 𝑃 𝑍311.5 ≥
311.5−301.89
6.26 )
(
= 1 − 𝑃 𝑍311.5 < 1. 54 )
= 0. 0618
> α
Thus, this result does not definitively suggest that the initiative had a significant impact on the studentsʼ
performance.
⬛
Final Thoughts on Experimental Results
It is tempting to hail improved results, like the one in the previous example, as a strong indication of
improvement, despite the hypothesis test suggesting otherwise. A er all, it sometimes may seem that
our result is so close to yielding a significant result! For example, the test run in the previous example
demonstrated a likelihood that a similar result would happen with about 1 in 20 schools, which though
somewhat rare, is not really that surprising. Unfortunately, many people attach their reputation to such
results and so would ignore analysis such as this as simply “skeptical" or just “mathematical hocum".
Employing statistical and probabilistic reasoning tempers such attachment, always ensuring a level of
objectivity when observing results; always remembering there is variability in life. By simply focussing
on a result without recognizing the grander circumstances, we can run into precarious consequences
o en leading to significant losses of time, effort, and money. It is our hope that in reading and working
through this book that you are now somewhat beginning to understand how to view such experimental
results with the necessary broadened lenses to adequately make better decisions moving forward.
p-Values v. The Population Proportion (p)
In the world of hypothesis testing one could easily confuse p-values with the population proportion.
This is an unfortunate set of conventions that must be consciously distinguished. Try to stay mindful
that:
p-values - probability calculation used in a hypothesis test, and
population proportion (p) - success rate on independent trials; used for confidence intervals.
Adieu.
220
221
Practice 3.14
Technique
1. For the following null and alternative hypotheses, test each result to the given significance level:
Sample size H0 Ha α Sample Mean

(n) (𝑥)
25 µ = 15, σ = 4 µ < 15 5% 11
120 µ = 42. 5, σ = 6. 7 µ > 42. 5 10% 55.9
10 µ = 4400, σ = 1200 µ < 4400 1% 4000
2. For the following null and alternative hypotheses, test each result to the given significance level:
Sample size H0 Ha α Sample Result

(n)
25 𝑝 = 0. 25 𝑝 > 0. 25 5% 10 out of 25
120 𝑝 = 0. 71 𝑝 < 0. 71 1% 78 out of120
10 𝑝 = 0. 5 𝑝 > 0. 5 10% 8 out of 10
3. A study of teacher perceptions of the behaviour of elementary school children was reported in,
Developmental Psychology (March 2003). In this study, teachers rated the aggressive behaviour
of a sample of 11 160 school New York City public school children by responding to the
statement, “The child threatens or bullies others in order to get his/her own way." Responses
were measured on a scale ranging from 1 (never) to 5 (always). The sample revealed a mean
score of 𝑥 = 2. 15 with a standard deviation of s = 1.05. Assuming a baseline score of µ = 3,
test this result to a significance level of 10%. (Note: Youʼll have to use the sample standard
deviation as an estimate of σ.)
222
4. Research published in Nature (August 27, 1998) revealed that people are more attracted to
“feminized" faces, regardless of gender. In one experiment, 50 human subjects viewed both a
Japanese female and a Caucasian male face on a computer. Using special so ware, each
subject could morph the faces (by making them more feminine or more masculine) until they
attained the “most attractive" face. The level of feminization x (measured as a percentage) was
determined.
a. For the Japanese female face, the study yielded a mean score of 𝑥 = 10. 2% and
𝑠 = 31. 3%. Test this result relative to the null hypothesis of a mean level of
feminization level of µ = 0% (indicating an equal preference of males and females) to a
significance level of α = 0. 05.
b. For the Caucasian male face, a mean of 𝑥 = 15% and 𝑠 = 25. 1% was found. Test this
result relative to the null hypothesis of µ = 0% to a significance level of α = 0. 05.
5. Some people believe that Canadian quarters are not fairly balanced, In fact, it is believed that
quarters favour a coin toss of “heads" overall. To test this claim, a sample of 50 quarters were
each tossed 50 times. Of the 2500 ensuing tosses 1316 ended up showing “heads". Does this
suggest that quarters are not equally balanced? Test this result to a significance level of
α = 0. 01.
6. In 1985, a study showed that 65% of single Canadians (neither in a marital nor common law
relationship) were using contraceptives when engaging in sexual activity. In 2005, a second
study indicated that 3722 of 5000 respondents claimed to be using contraceptives when
engaging in sexual activities. Test this result to a significance level of α = 0. 05.
223
3.15 Case Studies
title_section
The following case studies are real-life data sets from a variety of sources. By using methods of analysis
from this unit provide three distinct breakdowns whereby:
- There is at least one single variable comparison across qualitative grouping, AND
- There is at least one two-variable analysis which attempts to fit an appropriate model to the
relationship being investigated.
Include graphs of distributions/scatter plots and definitive conclusions regarding your breakdown
formatted on document so ware.
CASE 1: HEALTH INSURANCE COSTS IN AMERICA

The following set of data is a sample of American citizens who are enrolled in health insurance. The
data is broken down according to the following variables:
Age: age of the primary beneficiary, person who holds the policy
Sex: gender of policy holder (note non-identified is not considered in this set)
BMI: body mass index of the policy holder
Children: number of children covered by the policy
Smoker: whether or not the policyholder is a smoker
Region: region where the benefactor resides
Charges: Individual medical costs billed by health insurance
Chapter 3 Practice: Insurance
CASE 2: LIFE EXPECTANCY (WHO)

The Global Health Observatory (GHO) data repository under World Health Organization (WHO) keeps
track of the health status as well as many other related factors for all countries The data-set related to
life expectancy, health factors for 193 countries has been collected from the same WHO data repository
website and its corresponding economic data was collected from the United Nation website. Among all
categories of health-related factors only those critical factors were chosen which are more
representative. It has been observed that in the past 15 years , there has been a huge development in
the health sector resulting in improvement of human mortality rates especially in the developing
nations in comparison to the past 30 years. Therefore, in this project we have considered data from the
year 2000-2015 for 193 countries for further analysis. Some of the data for more developing nations are
missing.
Chapter 3 Practice: Life Expectancy
CASE 3: STATE OF THE WORLD'S CHILDREN (UNICEF)

UNICEF (United Nations International Children's Emergency Fund) was created with this purpose in
mind - to work with others to overcome the obstacles that poverty, violence, disease and discrimination
place in a child's path. The Google Sheets file linked below contains “sheets” which cover differing
issues related to childrenʼs health, development, and safety.
2019 State of the World's Children
224
CASE 4: HISTORICAL TEMPERATURE AND PRECIPITATION TRENDS IN CANADA
The data table linked below examines the changes in annual temperature and precipitation by region
and season in Canada. Departures from normal are defined as departures from the 1961 to 1990 normal
in Celsius degrees for temperature and in percent for precipitation.
Departure of Temperature and Precipitation from Normal
CASE 5: CRIME STATISTICS IN CANADA

The following table details a summary of incident-based crime statistics, by detailed violations, Canada,
provinces, territories and Census Metropolitan Areas since the year 2000. Statistics Canada
2000 - Present Canadian Incident Based Crime Statistics
225
B.O.B. “Back of the Book”
main_section
BOB has the answers.

Senior Woodlands Student circa 2002
226
UNIT 1:
title_section
Practice 1.1
1. 12
2. b) 3 c) 5
3. b) 18 c) 48
4. 19 200 000
5. 28 125
6. a) 1024 b) 324
7. 960
8. a) 3136 b) 112 896
Practice 1.2
1. 236
2. a) 216 b) 120 c) 60 d) 91
3. 205 506
4. 𝑛(𝑛 − 1)
5. 479 001 599
6. 3450
7. 1536
8. 976
9. up to 6 pulses
10. 9 153 720 576 000
11. HINT: Donʼt overthink this! Reflect on what addition really means.
12. HINT: Reflect on what subtraction really means.
13. HINT: It is instructive to actually try out this game with three pegs on each side and observe
how the game works, then attempt to translate this process as a series of decisions.
227
Practice 1.3
1. 6
2. 3
3. Low 21; High 33
4. 11
5. HINT: Use the same reasoning that was used in developing the two set scenario; ensuring to
keep track of regions of intersection that are overcounted.
6. a) 84 b) 20
7. 98
8. HINT: Using a Venn diagram set variables for each “slice” and algebraically work with the
resulting system.
9. 6
10. Data is inconsistent.
11. HINT: A pattern can be established that follows from the 2-set and 3-set cases.
Practice 1.4
1. a) 3024 b) 220 c) 18! d) n! e) 𝑛(𝑛 − 1)
2. a) 120 b) 360
3. 840
4. a) 1440 b) 1224
5. a) 7 b) 4 c) 8
6. 480
7. HINT: Express y and z in terms of x.
8. 557
9.
Practice 1.5
1. 241 920
2. a) 3 386 880 b) 1 814 400 c) 120 960 d) 59040 probably :)
3. 2,943,360
4. 12
228
Practice 1.6
1. 5040
2. 10
82!
3. 32! · 39! · 11!
4. 20
5. 360
6. 56
7. a) 2880 b) 211 680
8. 210
9. 2510
10. HINT: Mimic the solution of the example preceding the theoremʼs statement.
11. 630
Practice 1.7
1. 2 598 960
2. a) 369 600 b) 302 400
3. 3744
4. 330
5. 300
6. a) 1 b) Proof c) Proof (HINT: Consider selecting versus not selecting.)
7. HINT: This process overcounts; see if you can produce repetitious outcomes or outcomes that
are invalid.
8. a) 190 590 400 b) 168 168 000
9. 12
10. HINT: Focus on selecting the distinct objects, then identical.
Practice 1.8
1. 959
2. 24
3. 26
4. a) 12 b) 16 c) 19
5. 71
6. Hint: In each part use the preceding example as an approach to generalize from.
7. Hint: Consider developing equivalent collections which collapse to Theorem 1.8.2.
Alternatively, consider identifying how one could break the situation up into cases.
229
Practice 1.9
1. 66
2. a) 561 b) 272
3. 1 093 750
4. 15 120
5. 2400
6. 29 400
7. 12
𝐶 𝑘 − 1 · 13! · 40𝐶 𝑘
· 39!
230
Practice 1.10
1. 176
2. 256
3. 63
4. 252
5. HINT(S): These all stem from theorems encountered earlier in the chapter.
6. 42
2
7. 20 3
𝐿
8. HINT: Consider how the distributive property for algebra works and also consider writing out
the n brackets in expanded form.
9. HINT(S): Use the previous properties to algebraically prove these ones.
10. Conjectures and Proofs will vary.
Practice 1.11 (Challenge Task)

1. 51 240
2. HINT: if n = 18 there would be 3060 intersections; if n = 18 there would be 8855 intersections.
3. 5005
4. 39 180
5. 7143
6. 8
7. 1920
8. 683
231
UNIT 2:
title_section
Practice 2.1
1 1 1
1. a) 2
b) 6
c) 6
2 4
2. a) 25
b) 5
3. 0.00137
4. 0.273
6
5. 4165
1
6. 13983816
7. 0.4114
8. 10
9. Each die has six outcomes which can be achieved independently from each other.
Practice 2.2
1. 0.07
2. 0.924
3. 0.06
5
4. 8
5. 0.15
2
6. 𝑃(1) = 9
, you figure out the rest!
Practice 2.3
1. 0.35
2. 0.0045
3. 0.002
1
4. 3960
125
5. 1296
Practice 2.4
3
1. a) 5
b) 0.64
11
2. b) 25
3
3. 8
4. 0.63
5. W = 75; V = 225
6. 67.8%
7. 0.449
232
8. 0.3105 (if Cheng shoots at Jim first); 0.6326 (if Cheng shoots at the ground first)
9. HINT: Consider the theme of this section.
Practice 2.5
1
1. 3
1
2. 3
7
3. 20
4. 22.5%
5. Hint: Draw out the original probability tree.
2
6. 3
2
7. 5
8. Proof
6 8 9
9. Box 1, 23
; Box 2, 23
; Box 3, 23
9
10. 23
2 6 𝑤
11. a) 5
b) 13
c) 𝑤+𝑏
Practice 2.6
Simulations should be within 0.01 if done properly; proper design and enough trials.
1. 0.77
2. 0.4
3. 0.04
4. 0.167
5. 0.52
2
6. Switch the choice; 3
probability of winning.
Practice 2.7
10 12
1. a) X is the time taken in minutes. b) 𝑃(𝑋 = 1) = 1000
; 𝑃(𝑋 = 2) = 100
; 𝑒𝑡𝑐. c) 4.02
2. a) 𝑃(𝑇 = 1) = 0. 36; 𝑃(𝑇 = 2) = 0. 28; 𝑒𝑡𝑐. b) 2.2
3. $0.0077
4. a) -0.22 b) 22
5. At least $6.50
6. Lose $5.33
7. -$1.33
8. 1.994 m2
9. Contest rules and prizes vary from year to year.
233
234
Practice 2.8
1. a) 0.0009 b) 0.776
2. 0.64
3. a) 0.1514 b) 26
𝑛! 𝑎 𝑏 𝑐
4. a) 0.00313 b) 𝑎!𝑏!𝑐!
𝑝𝑞𝑟
c) Hint: Extend part b with an undefined number of outcomes.
5. Hint: Solve for 𝑃(𝑋 = 0) = 0. 5.
6. 13%
7. $1.50
8. Hint: Try to create a spreadsheet simulation that allows you to change the n and p parameters
with the graph shown.
9. Hint: Use the original definition for expectation to generate a sum in which we can factor out the
expression ʻnpʼ from each term.
Practice 2.9
1. 0.27
2. 365
3. 0.39
4. Hint: Follow the same type of reasoning used to develop the formulae for the first definition.
𝑟−2 2
5. (𝑟 − 1)𝑞 𝑝
𝑥−𝑛 𝑛
6. ( 𝐶
𝑥−1 𝑛−1)𝑞 𝑝
7. 14.7
8. Hint: Write out the infinite series and modify the geometric series formula.
Practice 2.10
1. a) 0.26 b) 0.948 c) 2
2. 0.12
3. 0.053
4. 0.026
5. 0.46
𝑃
48 𝑛−1
6. a) 0.56 b) 4 · c) 21.2
𝑃
52 𝑛
7. Hint: Modify version 2 of the hyper-geometric probability formula to adjust for the third type of
object.
8. Hint: You may need to research hyper-geometric series to help with this proof.
235
Practice 2.11
1. As the likelihood of a single outcome is usually quite small, it is more reasonable to measure the
probability of a range of outcomes to properly test the significance of the result.
2. p-value = 0.0022; reject the null hypothesis
3. p-value = 0.0563; accept the null hypothesis
4. p-value = 0.000641; reject the null hypothesis
5. p-value = 0.005; reject null hypothesis
236
UNIT 3:
title_section
Practice 3.1
The case studies examined in this section will yield varied proposals for variables, indices, and
procedures. Ensure that each is thought through according to the considerations laid out in the
readings of this section and seek out feedback from your instructor regarding the validity and
practicality of the proposals.
Practice 3.2
The case studies examined in this section will yield varied sampling strategies. Ensure that each is
thought through according to the considerations laid out in the readings of this section and seek out
feedback from your instructor regarding the validity and practicality of the proposals.
Practice 3.3
1. Mean: 𝑥 = 50. 34; Median = 51 ; Mode = 2, 48, 52, 80, 86

2. Answers will vary depending upon the grouping method.
3. Mean: 𝑥 = 234. 6 Median = 149.5; Mode = 149.5
4. HINT: Look at the chart for comparing differing measures of central tendency.
Practice 3.4
2
1. Mean Deviation = 23.9, Variance 𝑠 = 807. 5, Standard Deviation 𝑠 = 28. 4
2
2. Mean Deviation = 7.4, Variance 𝑠 = 74. 5, Standard Deviation 𝑠 = 8. 6
3. 23 data within 1 std dev; 30 data within 2 std dev; 30 data within 3 std dev;
Yes, this confirms Chebyshevʼs Theorem
4. Approximately 100 minutes.
5. a) 𝑥 = 30. 2, 𝑠 = 24. 1 b) SLI scores have less error and are more consistent than YND data
6. a) 𝑠 = 1. 92 b) 𝑠 = 1. 92 c) 𝑠 = 1. 92 d) Donʼt be lazy, itʼs obvious!
e) Hint: First determine the mean of the set, when each datum is increased by a constant, then
algebraically work through the formula for standard deviations.
7. b) Mean of z-scores is 0, standard deviation is 1
c) HINT: Consider the nature of the mean and why we must take the squares of deviations when
calculating spread.
237
Practice 3.5
1. Threatening: Median = 1.65, IQR = 0.4
Non-Threatening; Median = 2.1, IQR = 0.2
2. Male: Median = 17, IQR = 15

Female: Median = 12, IQR = 2.5
Non-Identified: Inconclusive, too few data points
Practice 3.6
1. Math v Poverty Level: r = -0.82 Strong Negative Correlation
Reading v Poverty Level: r = -0.894 Strong Negative Correlation
2. r = 0.41 ; Moderate Correlation, little evidence showing such a relationship exists.

3. HINT: Observe the formula and think about what happens if the relationship is perfectly linear.
4. a) r = 0 ; perfect randomness b) HINT: Think of the purpose of measuring the correlation.
238
Practice 3.7
1. a) Answers will vary; Hint: Consider social, cultural ramifications of sampling unmarried
females.
b) r = -0.865; Strong Negative Correlation
c) 𝐹 = − 0. 0546𝐶 + 6. 73; F(87) = 1.98, meaning approximately 2 children which
overestimates the 2019 fertility rate in Canada of 1.5.
2. a) r = -0.924; Strong Negative Correlation

b) slope = -0.0528; Impurity decreases by 0.05 per degree celsius/Kelvin, Intercept = -13.5 which
means at 0 oC there would be negative 1350 % impurity, so this result would be an ineffective
prediction. Itʼs perhaps better to convert into Kelvin.
3. a) r = 0.9998; Strong Positive Correlation

b) 𝑃 = 2. 57𝑡 + 250
c) When t = 30 (year 2020) P = 333.4; meaning the model predicts a global population of 331
million; close to the actual amount of 328 million!!
Practice 3.8
1. a. Logarithmic (R2 = 0.894); Polynomial (R2 = 0.982); Linear (R2 = 0.893)
b. Choose Logarithmic even though R2 is least as long term functional behaviour is better.
c. Answers will vary depending upon the model chosen. E.g. Logarithmic 9 million sales,
Polynomial shows negative sales.
2. a. Exponential (R2 = 0.997); Polynomial (R2 = 0.963)

b. Exponential is the best fit, both in terms of R2 and predictive behaviour.
c. Predicts a relative risk of 3.67 for a blood alcohol level of 0.08.
3. a. Polynomial (R2 = 0.53); all others have R2 below 0.2

b. Polynomial (degree-2) is the best; could also consider eliminating the outlier, but this has
little effect.
c. Start Up Cost is predicted to be $1.77 million
239
Practice 3.9
1. a. b. 0.025 c. 0.2
2. a. 0.1173 b. 0.2499
3. 𝑥 = − µ · 𝑙𝑛( ) ≃ 0. 693µ
1
2
4. HINT: Integrate the density function from 0 to ∞; if you know how this is done...integration is
the “inverse” process to differentiation.
5. HINT: Integrate the density function from 0 to x.
Practice 3.10
1. a) 0.8051 b) 0.0694 c) 0.8240 d) (-38.7, -24.3)
2. a) 0.9032 b) 0.0136 c) (100.1, 114.7)
3. a) $0.11/bag, or $48269/year (calculated on spreadsheet), $45000/year (from z-score chart)
b) $0.01/bag, or $4500/year
4. a) 64% b) 82%
5. (µ − σ · 𝑧α/2 , µ + σ𝑧α/2) or µ ± 𝑧α/2 · σ
Practice 3.11
1.
Confidence Level Z-Score (𝑧α/2)
90% 1.645
95% 1.96
99% 2.576
2. 34 (± 4. 1, 99 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 100)
3. $47. 52 (± $5. 78, 19 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 20)
4. 5. 4 (± 0. 83, 9 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 10)
5. Women: (268, 326) Men: (188, 232) ; Statistically significant difference
6. 2-3 samples were taken?? Not a very good study!
240
σ
7. HINT: Convert the sampling standard deviation, σ = in the confidence interval, and also
𝑥 𝑛
consider what the length of the interval would measure to be.
Practice 3.12
1. a) 𝑋 ≥ 21. 5 b) 4500. 5 < 𝑌 < 4549. 5 c) 𝑍 ≤ 15. 5 d) 𝑊 > 3. 5
2. 0.9997
3. 0.705
4. 0.3121
5. HINT: Observe the full shape of the “wave” when the conditions arenʼt met.
Practice 3.13
1. 4. 8% ± 2. 6%, 19 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 20
2. 82. 2% ± 6. 6%, 9 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 10
3. 68% ± 1. 94%, 19 𝑡𝑖𝑚𝑒𝑠 𝑖𝑛 20
4. They are still in a statistical dead-heat...just barely.
5. Approximately 841 subjects.
6. HINT: Set the desired interval range equal to the “±” portion of the confidence interval.
Practice 3.14
1. a) p-value = 0.0000002866515719; reject H0
b) p-value = 0; reject H0
c) p-value = 0.1459; accept H0
2. a) p-value = 0; reject H0
b) p-value = 1; accept H0
c) p-value = 0; reject H0
3. p-value = 0; reject H0
4. a) p-value = 0.011; reject H0
b) p-value = 0; reject H0
5. p-value = 0.46; accept H0
6. p-value = 0; reject H0
Practice 3.15
These are open-ended case studies where raw data is provided. Ensure to utilize the methods
discussed in the text to properly analyze and interpret the particular breakdown that youʼre examining.
241
Z-Score
Table:
main_section
242
243
Bibiliography
main_section
[1] Stewart, James. Finite Mathematics. McGraw-Hill Ryerson Limited, 1988.
244
[2] Barbeau, Edward. Finite Mathematics. Houghton Mi in Canada Lim-ited, 1988.
[3] Tucker, Alan. Applied Combinatorics. John Wiley and Sons, 1995.
[4] Rossman, Allan J. Workshop Statistics. Key College Publishing, 2001.
[5] Zeitz, Paul. The Art and Craft of Problem Solving, 2nd Edition. John Wiley and
Sons (Incorporated), 2007.
245

2023probabilistic & Statistical Reasoning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023probabilistic & Statistical Reasoning

Uploaded by

Copyright:

Available Formats

Probabilistic & Statistical

Woodlands Mathematicaa Publishing

Cover Art: Spiral Painting by J. Heathfield

This text is intended for a student audience participating in an introductory course in

Unit 1: Combinatorial Reasoning 10

Unit 2: Probabilistic Reasoning 65

Spreadsheet Simulations 1 - Introduction.avi 90

Unit 3: Statistical Reasoning 120

B.O.B. “Back of the Book” 225

Z-Score Table: 239

e nameless is the beginning of heaven and Earth.

Alternate Solution 1: (Tree Diagram)

Alternate Solution 2: (The “DIRECT” Approach or Using PLACEHOLDERS)

The Fundamental Counting Principle

3. In an upcoming election, voters can choose:

9. Prove the Fundamental Principle of Counting.

Definition 1.2.1 A set is a defined collection/grouping of objects/concepts called elements.

Definition 1.2.2 The union of two sets A and B is itself a set

Definition 1.2.3 The intersection of two sets A and B consists of all

For example, observing the sets M and S addressed above, we get:

Definition 1.2.4 The complement of a set A, denoted A c, consists of

Once again referring to our running example, the complement of the

For example, we would get that:

When the Direct Approach Fails

Case Description Total Outcomes

The Disjoint Cases Approach

𝑛(𝑆) = 𝑛(𝐴) + 𝑛(𝐵) +...

Example 1.2.2 (Licence Plates Revisited)

Case Description Total Outcomes

TOTAL 474 552 000

Alternate Solution: (Direct)

The Indirect Approach

i. Begins with a ʻ5ʼ : 18

ii. Doesnʼt begin with a ʻ5ʼ:

iii. 3-Digits (Cases) - Based on the first digit

ii. Doesnʼt begin

Alternate Solution: (Indirect)

DESIRABLE OUTCOMES (GOOD CASES) = (TOTAL) - (BAD) 271

Theorem 1.2.2 (Complementary Outcomes ak.a. The “INDIRECT” Approach)

A Summary of the Three Major Counting Approaches

Approach Primary Key Indicators

DIRECT ✖ Can obtain all outcomes from a step-by-step process or series of

Key Words: “AND”; “and then..”

Key Words/Phrases: “OR”; “this or that”

INDIRECT ➖ Easier to count outcomes which are undesirable (“BAD” outcomes).

Math and Engineering (M)

Physical Sciences (P)

Thus, to compensate for overlapping CASES

𝑇𝑜𝑡𝑎𝑙 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 = 𝑛(𝑀) + 𝑛(𝑃) − 𝑛(𝑀 ∩ 𝑃)

Theorem 1.3.1 (The Principle of Inclusion and Exclusion)

9. In a math contest, three problems, A, B, and C were posed.

A product such as 10 × 9 × 8 ×... × 2 × 1 is a common occurrence in combinatorics and so a special

15×14×13×12×···×2×1 15! 15!

Theorem 1.4.1 (Zero Factorial)

II. Using the concept of a permutation we get;

DESIRABLE OUTCOMES (GOOD CASES) = (TOTAL) - (BAD) 432

Example 1.5.2 (Satisfying the Picky People)

Alternate Solution: (Direct)

Theorem 1.6.1 (Counting Arrangements with Symmetries)

Example 1.6.3 (The Clock Problem)

Solution: (Direct Approach)

Alternate Solution: (Direct Approach)

Since weʼve eliminated any rotational