You are on page 1of 78

Stat 5002

An Introduction to Statistics with Applications


in Computing

Lecture 2

Introduction to Probability
Experiments; Events; Sample Space
Theoretical Probability of Events; Venn Diagrams
and Compound Events; Counting;
Empirical  Theoretical Probability
Experiments and Probability

2.2
A Spinner Problem

1 9 4 3

5 8

2 7

6
2.3
The problem
Player A: choose a spinner from the three
Player B: choose one of the remaining two spinners.

 Both players spin


Each spinner is equally likely to land on any
of its 3 regions.

Winner: the player whose spinner lands on the higher


number

Would you rather be player A or B? Why?


2.4
Experiments..
Experiments are operations which are carried out in order
to
 find something unknown or
 to test hypotheses.

Experiments
 can be repeated under identical circumstances
 involve randomness: outcomes cannot be precisely predicted
Outcomes
The outcomes of an experiment may be
 simple and known to us
 simple but have infinite possibilities
 complex, but known to us
 complex and unknown

When we make a decision or choice based on an unknown


outcome we are taking a Risk or Chance!

Invariably, the laws of probability apply to making such


decisions.
Examples of Outcomes

Experiment Outcome

simple and known

simple with infinite


possibilities

complex, but known

complex and unknown

2.8
Definitions

Random A process yielding a result


Experiment:
Events(E): Any outcome(s) of an experiment

Events that can’t be decomposed


Simple Events: also called Elementary Events
All possible outcomes of an
experiment listed in a mutually
Sample Space (S): exclusive and exhaustive manner.
The set of all simple events.

The number of simple events in the


n(S):
sample space.
2.9
Example
Experiment: Throw a dice
Simple Events: {1}, {2}, {3}, {4}, {5}, {6}
Sample Space: S = {1, 2, 3, 4, 5, 6}
the set of simple events
Some Events: E1 = {2, 4, 6} the set of even numbers
E2 = {1, 3, 5} the set of odd numbers
E3 = {1,2} a number less than 3
n(S): number of events in the sample space
n(S) = 6
n(Ei): number of simple events in Event i
n(E1) = n(E2) = 3
n(E3) = 2
2.10
Elements and Subsets

 "is an element of"


Venn Diagrams are
extremely useful to show
 "is a subset of" probabilities of
compound events

Example: S
E1
S = {1, 2, 3, 4, 5, 6}
E1 = {2, 4, 6} E4
E4 = {2, 4}

2.11
Exercise 1: Elements and Subsets
 "is an element of"
 "is a subset of“ S = {1, 2, 3, 4, 5, 6} 
E1 = {2, 4, 6} 
E4 = {2, 4}
Circle the TRUE statements:

E1 S E1  S S
E1
E4 S E4  S
1 S {2}  E4
E4
S  E1 {2}  E4
S  E1

2.12
Empty Set

The empty set: 


has no elements in it!
n() = 0
 is a subset of every set.
 is denoted by  or { }.

{0} is not empty!

Exercise: List all the subsets of {r i s k}


Exercise 2
Example: List all the subsets of {r i s k}
Probability
There are three types of probability:

Theoretical:
when the exact probability is known
In this case, we must know the Sample Space

Empirical: (relative frequency)


probabilities based on experimental evidence

Subjective:
Conjecture, based on guesswork or prior knowledge.

2.17
Theoretical Probability
Definition:
The theoretical probability of an event Ei is
P(Ei) = n(Ei)
n(S)
P(event) = number of times the event occurs in S
the number of possible outcomes in S

Example: E3 = {1,2} (number less than 3 in a single dice throw)

P(E3) = n(E3) = 2 = 1/3


n(S) 6

2.18
Two Rules of Probability

 The probability of an event must lie


between 0 and 1.

0 ≤ P(Ei) ≤ 1

 The sum of the probabilities of all simple events in


the sample space is 1.

∑P(Ei) = 1

2.19
Venn Diagrams
and
Compound Events
2.21
Venn Diagrams – Compound Events
Finding unions or intersections of several events can
form Compound Events.
These can be displayed using Venn Diagrams.

Examples:
Let A and B be two events in the sample space S.

A B
A B

AB AB
A  B: “A intersection B” A  B: “A union B”
 A and B  A or B 2.22
Venn Diagrams – Compound Events

B
A A
A

A: the complement of A Disjoint Sets: A  B =


not A (or AC)

2.23
Intersection of events

The intersection of two events is denoted by AB.


AB is the event that both A and B occur.

AB
A and B

P(A  B) is the probability that both A and B occur.


P(A  B) can be written as P(AB).

2.24
Intersection

Let S={1, 2, 3, 4, 5, 6, 7, 8, 9, 10}


Let A = {2, 4, 6}
Let B = {1, 2, 3, 4}
A B

P(S) = n(S)/n(S) = 10/10 =1


P(A) = n(A)/n(S) = ...
P(B) = n(B)/n(S) = ...
P(A  B) = n(A  B)/n(S) =...

2.26
Union of events
The union of two events A and B is denoted by AB.
A  B is the event that A occurs or B occurs or both A and B
occur together.

P(A  B) is the probability that

AB A  B A occurs or
B occurs or
both A and B occur together.

In general, P(A  B) = P(A) + P(B) - P(A  B)

2.28
Axiom

In general, P(A  B) = P(A) + P(B) - P(A  B)

2.29
Example - Union

Let S ={1, 2, 3, 4, 5, 6, 7, 8, 9, 10}


Let A = {2, 4, 6}
Let B = {1, 2, 3, 4}
B
A
P(S) = n(S)/n(S) = 10/10 =
P(A) = n(A)/n(S) =
P(B) = n(B)/n(S) =

P(A  B) = n(A  B)/n(S) =

P(A  B) = P(A) + P(B) - P(A  B)


=
=
2.30
P(A  B  C) = ???
P(A  B  C) = P(A) + P(B) + P(B) -

A
B

2.32
Exercise 3
Refer to the diagram and choose the correct answers from
, S, or A
a) S =
b)  =
c) A =
d) S  A = A
e) S  A = S
f) A  A =
g) A  A =

2.34
Exercise 4: Spam
Spam or junk mail are email messages, usually with commercial content,
sent in large quantities to an indiscriminate set of recipients.
It has been found that
 55% of the subject lines of spam mail contain enticing words such as
free, limited offer, click here, act now, satisfy, lose weight, earn
money, get rich, and over use of exclamation marks and capitals in the
text
 80% of spam messages are directed to more than 10 recipients, with
different domain names, in the To: and/or Cc: fields:
 Only 15% of spam does not contain the words listed above and is
directed to less than 10 recipients.
What is the probability that a randomly selected spam message includes
enticing words, exclamation marks, etc., and is directed to more than 10
recipients with different domain names?

Solve with the aid of a Venn Diagram


http://www.allspammedup.com/2011/09/teaching-people-how-to-identify-spam/ 2.36
Solution: Spam
What is the probability that a randomly selected spam message
includes enticing words, exclamation marks, etc., and is directed to
more than 10 recipients with different domain names?

P(E) = 0.55
P(M) = 0.80
P(E  M) =
Find P(E  M)

P(E  M) = P(E) + P(M) - P(E  M)


M
E
So
P(E  M) =
=
Mutually exclusive events
Mutually exclusive events are disjoint events.
 They have no elements in common.
The intersection of mutually exclusive events is the
empty set ()
For mutually exclusive
events
AB=
n(A  B) = 0
P(A  B) = 0 A
B
Also
P(A  B) = P(A) + P(B)
for mutually
exclusive events
2.41
Example

Let S={1, 2, 3, 4, 5, 6, 7, 8, 9, 10}


Let A = {2, 4, 6}
Let C = {5, 7, 9}

Then
AC= A
 n(A  C) = 0 C
 P(A  C) = 0

Also P(A  C) = P(A) + P(C)

A and C are disjoint


A and C are mutually exclusive
2.42
Exercise: 5
a) If P(A) = 0.5 , P(B) = 0.2 and P(A or B) = 0.7, then
are the events, A and B, mutually exclusive? Explain.

2.43
Exercise: 6
b) If P(A) = 0.5 and P(B) = 0.6, then are the events A
and B mutually exclusive? Explain.

2.45
Complementary Events
The complement of a set A is denoted by A or Ac.
The complement of a set A
consists of all the elements
that are not in A.
A A
P(A) = 1 - P(A)

Example
Let S={1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Let A = {2, 4, 6}

P(A) = 3/10
P(A) = 7/10 or P(A) = 1 - P(A)
= 1- 3/10 = 7/10

2.47
Exercise 7
If P(A) = 0.5 , P(B) = 0.2 and P(A or B) = 0.7, then find
a) P(A  B)

b) P(A  B)

If P(A) = 0.5 and P(B) = 0.6 and P(AB) = 0.8, then find
c) P(A  B)

d) P(A  B)

2.48
Solution: 7a, 7b
If P(A) = 0.5 , P(B) = 0.2 and P(A or B) = 0.7, then find
a) P(A  B)

A B
A

b) P(A  B)

B
AA B

2.49
Solution: 7c, 7d
If P(A) = 0.5 and P(B) = 0.6 and P(AB) = 0.8, then find

c) P(A  B)

A B
A

d) P(A  B)
B
AA B

2.51
In general

P(A  B)  P(A  B) B
“Not A or B” =>
A B
“not A and not B” A

B
P(A  B)  P(A  B)
B
“Not A and B” => AA B
“not A or not B”

2.53
Exercise 8
In a sample space, events A and B are such
that P(A) = P(B) = 1/2, and P(AB) = 1/3.

Find
a) P(A  B)

b) P(Ac  B)

c) P(Ac  Bc)

2.54
Solution 8
In a sample space, events A and B are such that
P(A) = P(B) = ½ , and P(AB) = 1/3 .

Find
a) P(A  B) = P(A) +P(B) –P(AB)
= 2/ 3

b) P(Ac  B) = P(B) – P(AB)


= 1/ 6

c) P(Ac  Bc) = 1-P(A  B)


= 1/ 3

2.55
Counting

56
Counting

When an experiment includes k trials, each


consisting of nk possible outcomes,
then the
number of possible outcomes for the experiment
is:

n1×n2×n3×… …×nk

2.57
Example: Counting
What is the number of ways of choosing poker hands
(5 cards) from a pack of 52 cards?

Number of ways of choosing poker hands


= 52 * 51* 50 * 49 * 48
= 311,875,200

Some of these hands will be the same!

2.58
Permutations
The number of ways we can order n objects, using r
objects at a time, is
nP = n!
r
(n-r)!

Example:
The number of ways we can make a two digit number
from the three digits 4,5 and 6 is

3P = 3! = 3*2*1 = 6
2
(3-2)! 1

The numbers are:{45, 46, 54, 56, 64, 65}


Ordering is important!
2.59
Combinations
The number of ways we can distinctly arrange n objects,
using r at a time, is
nC = n!
r
r!(n-r)!
Example:
The number of ways we can make one colour by
combining two of the three colours Y (yellow), R (red) and
B (blue) is
3C = 3! = 3x2x1 = 6 = 3
2
2!(3-2)! 2x1 2
The colours are: {YR, YB, RB}
ie (orange, green and purple)
Ordering is not important! 2.60
Exercise 9

There are 10 socks in Jo's drawer, two black, two brown,


two green two blue and two white.

If Jo selects 2 socks at random

a)How many combinations, different pairs of socks can


Jo select?

b)What is the chance that Jo will select a matching pair


of socks?

2.61
The socks!

B- black B1B2
R- brown R1R2
G- green G1G2
L- blue L1L2
W- white W1W2

2.62
A Solution

The choices!
B1B2 B2R1 R1R2 R2G1 G1G2 G2L1 L1L2
B1R1 B2R2 R1G1 R2G2 G1L1 G2L2 L1W1
B1R2 B2G1 R1G2 R2L1 G1L2 G2W1 L1W2
B1G1 B2G2 R1L1 R2L2 G1W1 G2W2
B1G2 B2L1 R1L2 R2W1 G1W2
B1L1 B2L2 R1W1 R2W2
B1L2 B2W1 R1W2 L2W1 W1W2
B1W1 B2W2 L2W2
B1W2
10C = 45 choices
2
2.63
Solution: Exercise 9
If Jo selects 2 socks at random
a)How many combinations, different pairs of socks can
Jo select?

10C = 45 choices
2

b)What is the chance that Jo will select a matching pair


of socks?

P(match) = number of matching pairs


number of pairs
=5 =1
45 9
2.64
Exercise 10
Two cards are drawn from a pack of 52 cards.

a)How many combinations, different pairs of cards can


be selected?

b)How many of the pairs of cards are matching pairs?

c) What is the chance of a matching pair?

2.65
Review: Theoretical Probability

It is intuitive that, for example, when we throw a dice,


then the probability of throwing a 6 is 1/6.
This is the theoretical probability. It can be calculated
from the formula

P(Ei) = n(Ei)
n(S)

since we know all the elementary events in the sample


space.

2.67
Empirical Probability

68
Empirical Probability

We can estimate the probability of an event A in the


following way.

 Perform a large number of independent


experiments under conditions that event A may
take place, and then observe the proportion of
cases in which A actually occurs.

This proportion is the estimated or empirical


probability of A.

2.69
Empirical probability ~ Relative Frequency

We could arrive at an estimate of ‘throwing a 6’ by


throwing a dice a large number of times and counting the
number of times a 6 appears.
Such probabilities, derived from repeating an experiment
many times is called empirical probability.
An example of a
simulation experiment

relative frequency of 6 = number of 6’s


number of throws
= estimated probability of a 6
= empirical probability of a 6.
2.70
Empirical Probability
We could arrive at an estimate of successfully completing a
game of ‘patience’ by playing the game a large number of
times and counting the number of times are able to
complete it.
Then the

relative frequency of a win = number of successes


number of games
= estimated/empirical probability of a win.

As the number of times an experiment is repeated


increases, the relative frequency (empirical probability)
approaches the theoretical probability.
2.71
Empirical  Theoretical
As the number of times an experiment is repeated
increases, the relative frequency (empirical probability)
approaches the theoretical probability.

P(A)  n(A)
n
 the number of times event A occurs
the number of observations
lim n(A) = P(A)
n  n

2.72
Some empirical evidence
The following data arises from “A Case-Control Study of
Alcoholic Beverage Consumption and Breast Cancer”. *

The women were asked about their drinking


habits and the following table was constructed.
Breast Cancer
Drink habit Cases Controls
Less than 4 drinks per week 330 658
4 or more drinks per week 204 386

* American Journal of Epidemiology 131 (1990): 6-14.

2.73
Exercise 11
How many women were included the study?

If a woman was selected at random from those in the study, estimate


the probabilities

1. She does not have breast cancer


2. She has at least 4 drinks per week
3. She has breast cancer and she has less than 4 drinks per week.
4. She has breast cancer and she has at least 4 drinks per week.
5. She has less than 4 drinks per week if we know she does not
have breast cancer.
6. She does not have breast cancer if we know she has at least 4
drinks per week.

2.74
Solution: Exercise 11
Breast Cancer
Yes No Totals
<4 330 658 988
drinks
>=4 204 386 590
Totals 534 1044 1578
1 She does not have breast cancer
2 She has at least 4 drinks per week
3 She has breast cancer and she has less than 4 drinks /wk.

4 She has breast cancer and she has at least 4 drinks /wk.

She has less than 4 drinks per week if we know she does not
5
have breast cancer.
She does not have breast cancer if we know she has at least
6
4 drinks per week. 2.75
Breast Cancer

Yes
Solution:
No Totals
Exercise 11
330 658
<4 988
drinks There were 
>=4 204 386 590 1578 women 
included in 
Totals 534 1044 1578 the study

Q Estimate the probability … p


1 She does not have breast cancer 1044/1578
2 She has at least 4 drinks per week 590/1578
She has breast cancer and she has less than 4
3 330/1578
drinks per week.
She has breast cancer and she has at least 4 drinks
4 204/1578
per week.
She has less than 4 drinks per week if we know she
5 658/1044
does not have breast cancer.
She does not have breast cancer if we know she
6 386/590
has at least 4 drinks per week.
2.76
Simulation

2.77
Empirical  Theoretical
As the number of times an experiment is repeated
increases, the relative frequency (empirical probability)
approaches the theoretical probability.

P(A)  n(A)
n
 the number of times event A occurs
the number of observations
lim n(A) = P(A)
n  n

2.78
Simulate many Experiments using R
Examples: (R code in Blue; Output in Red)
1. Toss a Coin 10 times 2. Toss a Coin 1000 times
x1<-sample(c("H","T"),10, replace=TRUE) x2<-sample(c("H", "T"),1000,replace=T)
table(x1)/10 table(x2)/1000
H T H T
0.4 0.6 0.47 0.53

3. Flip a die 60 times 4. Flip a die 6000 times


x3<- sample(c(1:6),60,replace=T) x4<- sample(c(1:6),6000,replace=T)
table(x3)/60 table(x3)/6000

1 2 3 4 5 6 1 2 3 4 5 6
0.217 0.067 0.100 0.267 0.150
0.200 …… …… …… …… …… ……

barplot(table(x3)/60) (have a look!) barplot(table(x4)/6000) (have a look!)

2.79
Simulate Spinner Game using R

1 9 4 3

5 8

2 7

6
2.80
The problem
Player A: choose a spinner from the three
Player B: choose one of the remaining two spinners.

 Both players spin


Each spinner is equally likely to land on any
of its 3 regions.

Winner: the player whose spinner lands on the higher


number

Would you rather be player A or B? Why?


2.81
Simulate Spinner Game using R

1. Blue Spinner (Spin 10000 times) 2. Green Spinner (Spin 10000 times)
blue<-sample(c(1,5,9),10000, green<-sample(c(3,4,8),10000,
replace=T) replace=T)
table(blue)/10000 table(green)/10000

blue
1 5 9
0.3324 0.3426 0.3250

3. Yellow Spinner (Spin 10000 times) 4. Combine 10000*Blue_Yellow spins

yellow<-sample(c(2,6,7),10000, Spins_BY<-c(blue,yellow)
replace=T)
table(yellow)/10000 spins<-data.frame(blue, yellow)
names(spins)<- c("blue", "yellow")

2.82
Simulate Spinner Game using R
5. Find winning number 6. Assign winning colour

winBY<- apply(Spins_BY, 1, max) if (winBY ==2) { 


table(winBY)/10000 winBY<‐"yellow" 
} else { 
if(winBY ==6) { 
winBY<‐"yellow" 
} else{
if(winBY ==7) { 
winBY<‐"yellow" 
} else winBY<‐"blue" 
} }
7. Find winning colour 7. Assign winning colour??

table(winBY)/10000 if(winBY ==2){winner<-"yellow"} else


{if(winBY ==6){winner<-"yellow"} else
{if(winBY ==7){winner<-"yellow"} else
winner<-"blue"}}

2.83
Notes
R doesn’t like being told to operate on a vector that
doesn’t yet exist. So, we set up an empty vector to add
stuff to later (this isn’t the most efficient way to do this,
but it works!).

84
Solve Spinner Game problem using Tree
Diagrams

85
Yellow  Blue
Winner
1 Yellow

5 Blue
2

9
Blue
1 Yellow

6 5 Yellow

9 Blue

1 Yellow

7 5 Yellow
P(Yellow wins if YB)=5/9
9 Blue
2.86
Blue  Yellow
Winner
2 Yellow

6 Yellow
1

7
Yellow
2 Blue

6 Yellow
5

7 Yellow

2 Blue

9 6 Blue
P(Yellow wins if BY)=5/9
7 Blue
2.87
Yellow  Green
Winner
3 Green

4 Green
2

8
Green
3 Yellow

6 4 Yellow

8 Green

3 Yellow

7 4 Yellow
P(Yellow wins if YG)=4/9
8 Green
2.88
Strategy
First colour Second colour Most likely winner First or Second?
Yellow Blue P(Yellow)=5/9 1st
Yellow Green P(Green)=5/9 2nd
Blue Yellow P(Yellow)=5/9 2nd
Blue Green P(Blue)=5/9 1st
Green Yellow P(Green)=5/9 1st
Green Blue P(Blue)=5/9 2nd

So it is better to be the second player, player B.
Player B is always able to choose the option with the higher 
chance of winning

A's Choice  B's Choice
Yellow Green
Green Blue
Blue Yellow
92
Exercise 12
A die is loaded so the probability that a face turns
up is proportional to the number on that face.

a. What does this mean?

b. What must k be?

c. If the die is thrown, what is the probability that


an even number occurs?

d. P(2) + P(4) + P(6) =

2.93
Probability Proportional to n
P(1) = 1/k
P(2) = /k
P(3) = /k
P(4) = /k
P(5) = /k
P(6) = /k
Sum = /k

Since P(x) = , then k =

2.94
Exercise 12

A die is loaded so the probability that a face turns up


is proportional to the number on that face.

What does this mean? P(n) = n/k

What must k be? k = 21

If the die is thrown, what is the probability that


an even number occurs?

P(2) + P(4) + P(6)


= P(even)
= 2+4+6
21
= 12/21
2.96
Puzzle

Jack and his daughter Kate choose who will mow the
lawn by a random process: Jack has one green and
two red marbles in his pocket; two are selected at
random. If the colours match, Jack mows the lawn,
otherwise Kate mows the lawn.

 Is the game fair?


 If we are able to add marbles to Jack's pocket, can
the game be made fair?
 Suggest a number of red and a number of green
marbles so that the game is fair.

2.97
Solution: Jack and Kate

2.98
References
 Stat 273 Lecture Notes; Macquarie University
 Wasserman (1.1 – 1.4, Computer Exercises 21 and 22)
 Introduction to Probability and Statistics Using R,
G.J.Kerns (Ch 1- 4.7)
 Online Stat Book (Section V)
http://onlinestatbook.com/2/index.html
 HyperStat Online (Section 4)
(http://davidmlane.com/hyperstat/index.html)

2.102

You might also like