3 Probability

Contents
1 Probability Theory 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Independence of Two Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Combinations – order does NOT matter . . . . . . . . . . . . . . . . . . 10
1.4.2 Permutations – order matters . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Probability Distributions 13
2.1 Categorical Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Numerical Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Binomial Probability Distribution . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.4 Normal Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Descriptive Methods for Assessing Normality . . . . . . . . . . . . . . . . . . . . 26
3 Sampling Distributions 28
3.1 Statistics and Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 The Sampling Distribution of a Sample Mean . . . . . . . . . . . . . . . . . . . 29
3.2.1 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1
1 Probability Theory
Probability theory is used as a tool in statistics. It helps to evaluate the reliability of our
conclusions about the population when we have only information about a sample.
Probability theory is also the main tool in developing a model for describing the populations
in inferential statistics. In inferential statistics the information from a random sample is used
to make statements about an entire population.
Many things in our world depend on randomness, if you observe Head while flipping a coin or
you roll a 6 with a ”fair” die, if the bus is on time, etc. Even those events occur randomly
there is an underlying pattern in the occurrence of these events. This is the basis of Probability
Theory.
1.1 Introduction
We now provide the vocabulary for probability theory.
Definition:
1. A phenomenon is random if individual outcomes are uncertain but there is nonetheless a
regular distribution of outcomes in a large number of repetitions.
2. The probability of any outcome (or event) of a random phenomenon is the proportion of
times the outcome would occur in a very long series of repetitions.
Probability Models
Definition:
1. The sample space of a random phenomenon is the set of all possible outcomes.
2. An event is a subset of the sample space.
Example:
Examples for random phenomenons are:
• Recording a test grade
• Measure daily snowfall
• Count leucocytes in a blood sample
• Roll a die
• Toss a coin
The sample spaces of above random phenomenon are:
• {A, B, C, D, E, F }
• All numbers between 0 and 200 cm.
• 1 - 10000/ml ??
• {1, 2, 3, 4, 5, 6}
2
• {H, T }
Examples for events after tossing a die and observing the number on the upper face are:
• Observe a 1 ={1}
• Observe an odd number ={1, 3, 5}
• Observe 1 or 6 ={1, 6}
Example: Suppose you roll an unbiased die with 6 faces and observe the outcome.
Venn Diagrams
The outer box represents the sample space, which contains all of the simple events. Is A =
{E1 , E2 , E3 } the collection of simple events the appropriate events are circled and labelled with
the letter A.
S
E3
A
E1
E4 E5
E2
E6
Basic Properties:
• The probability of an event is a number between 0 and 1, since it is a limit of a relative
frequency.
• P (E) = 0, if the event E never occurs. P (role a 7 with a regular die)=0
• P (E) = 1, if the event always occurs. P ( role with a regular die a number smaller than
7)=1
• The sum of the probabilities for all simple events in S equals 1.
1.2 Basic Rules

Definition:
• The probability of an event A is equal to the sum of the probabilities of the simple events
contained in A.
• The union of events A and B, denoted by A ∪ B, is the event that either A or B or both
occur.
A B
3
• The intersection of events A and B, denoted by A ∩ B, is the event that both A and B
occur.
A B
• The complement of an event A, denoted by Ac , is the event that A does not occur.
Ac
Example: Continue the die rolling example.
C is the event to roll an odd number. Let D be the event to roll a number smaller than
4.
Then D = {1, 2, 3}.
The probability of every simple event is 1/6.
With the definition follows that P (C) = P ({2}) + P ({4}) + P ({6}) = 1/6 + 1/6 + 1/6 =
3/6 = 1/2.
P (C ∪ D) = P ({1, 3, 5, 2}) = 4/6 = 2/3.
P (C ∩ D) = P ({1, 3}) = 2/6 = 1/3.
P (Dc ) = P ({4, 5, 6}) = 3/6 = 1/2.
Remark:
Until this point: In order to calculate the probability of an event,
1. find all the simple events, that belong to the event,
2. find the probabilities for those simple events and
3. add those probabilities to derive the probability for the event of interest.
The following rules provide tools for the determination of the probabilities of events like A ∪ B
and Ac .
Addition Rule
4
Let A and B be two events. The probability of the union can then be calculated by:
P (A oder B) = P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Remark:
The subtraction of P (A ∩ B) is necessary because this area is counted twice by the addition of
P (A) and P (B), once in P (A) and once in P (B). Check the diagram below.
A B
A and B
Rule for Complements
The probability of the complement of an event is 1 minus the probability of this event.
Mathematically:
If P (Ac ) = 1 − P (A)
Example:
Let A = {1, 2} and B = {1, 3, 5}, with P (A) = 31 , P (B) = 21 , and P (A∩B) = P ({1}) = 16 .
With the Addition Rule we get P (A ∪ B) = P (A) + P (B) = 1/3 + 1/2 − 1/6 = 2/3.
With the Rule for Complements is P (Ac ) = 1 − 1/3 = 2/3.
1.3 Independence of Two Events

Two events are considered independent, when the occurrence of one of the events has no impact
on the probability for the second event to occur. They don’t have an impact on each other.
In this section we will define and do examples to illustrate this concept. For defining the
independence of events properly we first introduce the concept of a conditional probability.
Definition
If A and B are events with P (B) > 0, the conditional probability of A given B is defined by
P (A ∩ B)
P (A|B) := .
P (B)
5
B Bc
S
A and B c
A and B
A
The interpretation of the conditional probability P (A|B) is as follows:

Given that you know already event B occurred, what is the probability that A occurs.
Example:
Consider the event B to roll an even number with a fair 6 sided die.
The Probability for the event A to roll a 1, given B equals 0: P (A|B) = 0
Because: If you roll a die, someone peeks and tells you that the number is even, than this
number can not be a 1. So the conditional probability of A given B must be 0.
In probability theory two event are considered independent, if the knowledge that one of them
has occurred does not change the probability for the second to occur.
Definition:
Events A and B, with P (B) > 0, are independent if
P (A|B) = P (A),
(which is equivalent to P (A ∩ B) = P (A) · P (B)).
Remark:
From this definition we also learn that: If we know that two events A and B are independent
then
P (A ∩ B) = P (A)P (B)
This is only true for INDEPENDENT events.
The following example shows how to apply these two two concepts to the experiment of rolling
a die.
Example:
Consider the experiment to roll a fair die.
Let A = {1, 2, 3}
B = {4}
C = {3, 4}
Then we can calculate, by finding the simple events contained by the described events and
adding their probabilities:
1
• P (A) = 2
1
• P (B) = 6
1
• P (C) = 3
• P (A ∩ B) = 0
1
• P (A ∩ C) = 6
6
P (B ∩ A) 0
• P (B|A) = = 1 = 0 (by definition of the conditional probability)
P (A) 2
Since P (B) 6= P (B|A), we find that B and A are not independent (apply the definition
for independence).
1
P (C ∩ A) 6 1
• P (C|A) = = 1 =
P (A) 2
3
Since P (C) = P (C|A), we find that C and A are independent.
The following example shows the practical relevance of these concepts. In this example they are
applied to an AIDS test, but please be aware that the same arguments apply to other clinical
tests.
Example:
• ELISA is a test for detecting HIV antibodies in blood samples.
• The test is positive in 99.7% of blood samples containing antibodies.
• The test is positive in 1.5% of blood samples containing no antibodies.
Assuming that 1% of the population do have HIV antibodies in their blood, how big is the
probability that a person has HIV if his/her blood tests positive?
For finding an answer to this question we translate the number given above into probability
theory and apply the rules introduced.
First choose the denotation:
• Let H+ be the event of a blood sample containing antibodies.
• Then is H− = H+C the event of a blood sample not containing antibodies.
• Let T+ be the event of a positive ELISA test result.
• Then is T− = T+C the event of a negative ELISA test result.
Now “translate” the facts stated above into conditional probabilities.

We get from the introductory statements and the definition:
P (T+ |H+ ) = 0.997, P (T+ |H− ) = 0.015, and P (H+ ) = 0.01 and want to know P (H+ |T+ ).
Calculate using the definition:

P (H+ ∩ T+ )
P (H+ |T+ ) =
P (T+ )
Now calculate nominator and denominator separately.

First the denominator (applying the definition):
P (H+ ∩ T+ ) = P (H+ )P (T+ |H+ ) = 0.01 · 0.997 = 0.00997
7
Second the nominator:
P (T+ ) = P ((T+ ∩ H+ ) ∪ (T+ ∩ H− )) cutting T+ in two pieces
= P (T+ ∩ H+ ) + P (T+ ∩ H− ) Addition Rule
= P (H+ )P (T+ |H+ ) + P (H− )P (T+ |H− ) definition of conditional probability
= 0.01 · 0.997 + (1 − 0.01) · 0.015 = 0.02482 Rule for Compliments
Put the nominator and the denominator together and we get P (H+ |T+ ) = 0.4017. This prob-
ability is very low and a test result would have to be confirmed with additional tests. This
shows that ELISA is rather a test to exclude HIV then to test for HIV. Try on your own to
calculate P (H− |T− ).
Since the probability for a positive ELISA test is different for blood samples with or without
antibodies, we conclude that the probability for a positive ELISA test is different in the sample
space than it is for positive blood samples.
We can say that the event ”positive ELISA test” is NOT independent from the event ”antibodies
in the blood sample”.
Example:
Consider the experiment of rolling two unbiased dice.
The simple events of this experiment can be described as a pair of numbers between 1 and 6.
(3, 4) would be interpreted as rolling a 3 with the first die and a 4 with the second die.
The sample space contains of all pairs with numbers between 1 and 6 in the first and the second
component. So that the sample space consists of 6·6=36 elements.
Since both dice will fall independently we can calculate the probability of rolling two 6 by
1 1 1
P ((6, 6)) = P (6 with the first die) · P(6 with the second die) = · = .
6 6 36
Example:
A satellite has two power systems, a main and an independent backup system.
Suppose the probability of failure in the first ten years for the main system is 0.05 and for the
backup system 0.08.
What is the probability that both systems fail in the first 10 years and the satellite will be lost?
Let M be the event that the main system fails and B the event that the backup systems fails.
Since the systems are independent we get:
P (M ∩ B) = P (M ) · P (B) = 0.05 · 0.08 = 0.004
The event O that at least one of the systems is still operational after 10 years is the complement
event of both systems failing. We calculate:
P (O) = 1 − P (Oc ) = 1 − 0.004 = 0.996
Example :
8
An experiment can result in one of five equally likely simple events E1 , E2 , E3 , E4 , E5 . Events
A, B and C are defined as follows:
A := {E1 , E3 }
B := {E1 , E2 , E4 , E5 }
C := {E3 , E4 }
Find the probabilities for the following events:
a. Ac b. A ∩ B c. B ∩ C
d. A ∪ B e. B|C f. A|B g. Are events A and B independent?
Solution (first try on your own):

a. 3/5 b. 1/5 c. 1/5
d. 1 e. 1/2 f. 1/4 g. No they are not, since P (A) 6= P (A|B)
9
1.4 Counting
The assignment of probabilities to events can be quite hard. When calculating probabili-
ties using p=favourable/possible, it helps to know some counting rules to find the number of
favourable and possible outcomes.
For example we will be asking
• In a football tournament between 9 schools, how many football games are played if all
schools play each other exactly once?
This is the same as asking: How many pairs of schools exist? The order of the teams
listed does not matter, if school 1 plays school 2 or school 2 plays school 1 is considered
the same. This is an example of Combinations.
• Suppose we have to form all numbers consisting of three different digits using only 1,2,3,or
4. To form the numbers the digits will be arranged in different orders. Different numbers
will be formed depending upon the order in which we arrange the digits. This is an
example of Permutations.
We will have to distinguish for the purpose of counting, if the order matters or not.
1.4.1 Combinations – order does NOT matter

Example 1
In a tournament between 4 schools, how many football games are played if all schools play each
other exactly once?
Let us list them:
(1,2), (1,3), (1,4), (2,3), (2,4), (3,4), there are 6 matches.
Suppose a sample of n elements is to be drawn from a set of N elements. Then the number of
different samples possible is denoted by N Cn .
N!
N Cn =
n!(N − n)!
where N ! = (1)(2)(3) · · · (N − 2)(N − 1)N (similar for n!)
Applying this to the example, means we are asking for the number of samples of size 2, that
can be drawn from a set of 4, with the answer
4! 24
4 C2 = = =6
2!2! (2)(2)
With combinations the order we list the members in the sample do not matter, if school 1 plays
school 2, is the same as school 2 playing school 1.
Example 2
How many samples of size 2 exist for our class?
Order does not matter in a sample, so
60! 58!(59)(60) 3540

60 C2 = = = = 1770
2!(58!) 2(58!) 2
10
Example 3
Consider a lottery 6 out of 49, with one grand prize, when you guess exactly the correct numbers.
What is the probability for guessing the correct numbers?
p= favourable/possible = 1/ possible
How many different sets of 6 numbers out of 49 numbers exist?
Since order does not matter we use combinations, so
49! 41!(42 ∗ 43 ∗ 44 ∗ 45 ∗ 46 ∗ 47 ∗ 48 ∗ 49) 1.8183436 ∗ 1013

49 C6 = = = = 259, 763, 366, 016
6!(41!) 720(41!) 720
and p = 3.8496575 ∗ 10−12 = 0.00000000000384

I am not going to hope to win the grand prize ever.
1.4.2 Permutations – order matters

Example 4
Suppose we have to form a number consisting of two digits using the digits 1,2,3,4,5 (each digit
can not be used more than once).
Let us list the different options:
12, 13, 14, 15, 21, 23, 24, 25, 31, 32, 34,35, 41, 42, 43, 45, 51, 52, 53, 54 there are 20 possibilities.
Given a single set of N different elements, you wish to select n different elements from the N and
arrange them within n positions. Then the number of different permutations (arrangements)
is denoted by N Pn .
N!
N Pn =
(N − n)!
Applying this to the example, means we are asking for the number of permutations of size 2,
that can be drawn from a set of 5, with the answer
5! 120
5 P3 = = = 20
3! 6
Example 5
Assume for a class of 60 students we need a class representative and a class coordinator.
How many different choices for the two positions do exist? Order matters, the first is the
representative and the second choice the coordinator.
60! 60!
60 P2 = = = 59(60) = 3540.
2! 58!
Example 6
In a certain country, the car number plate is formed by 4 digits from the digits 1, 2, 3, 4, 5, 6,
7, 8 and 9 followed by 3 letters from the alphabet. How many number plates can be formed if
neither the digits nor the letters are repeated?
There are 9 P4 choices for the numbers and 26 P3 choices for the letters, since every number can
be combined with every letter, 9 P4 × 26 P3 many number plates can be formed.
What is the probability that randomly chosen number plate does not have a 1 or 2 or 3?
favourable = 6 P4 × 26 P3 , (we can choose only out of 6 digits)
possible = 9 P4 × 26 P3 , so
p= (6 P4 × 26 P3 )/(9 P4 × 26 P3 ) = 6 P4 /9 P4 = (6!/4!)/(9!/4!) = 6!/9! = 1/(7 ∗ 8 ∗ 9) = 1/504
11
Example 7
Lottery
In a certain state lottery, 48 balls numbers 1 to 48 are placed in a machine and six are drawn
at random. You win if you guess the correct 6 numbers.
What is the probability to win?
First we have to find the number of possible outcomes. (order does not matter), therefore
N = 48 C6 = 12, 271, 512. Only one number is going to be chosen and the probability is
1
≈ 0.0000000815
48 C6
Example 8
You win a smaller prize if you get 5 numbers right. There are 5 C6 possibilities to choose 5
of the six numbers and one of the remaining 42 numbers, giving you a total of 42(5 C6 = 252
possibilities. The probability for a second prize is then
252
≈ 0.0000205
12271512
12
2 Probability Distributions
In the chapter about descriptive statistics samples were discussed, and tools introduced for
describing the samples with numbers as well as with graphs.
In this chapter models for the population will be introduced. One will see how the properties
of a population can be described in mathematical terms. Later we will see how samples can be
used to draw conclusions about those properties. That step is called statistical inference.
Definition:
A variable X (we use capital letters for random variables) is a random variable(rv) if the value
that it assumes, corresponding to the outcome of an experiment, is a chance or random event.
Example:
• X=number of observed ”Tail” while tossing a coin 10 times
• X=survival time after specific treatment of a randomly selected patient
• X=SAT score for a randomly selected college applicant
Similar as for variables in sample data, rvs can be categorical or quantitative, and if they are
quantitative they can be either discrete or continuous.
categorical
%
random variable discrete
& %
quantitative
&
continuous
Similar to data description, the models for rvs depend entirely on the type the rv. The models
for continuous rvs will be different than those for categorical or discrete rvs.
2.1 Categorical Random Variables

Categorical random variables are described by their distribution.
Definition
The probability distribution of a categorical rv is a table giving all possible categories the rv
can assume and the associated probabilities.
Example:
The population investigated are the students of a selected college. The random variable of
interest X is the residence status, it can be either resident or nonresident.
If a student is chosen randomly from this college, the probability for being a resident is 0.73.
Is X the random variable resident status then write P (X =resident)=0.73.
The probability distribution is:

resident status probability
resident 0.73
nonresident 0.27
13
2.2 Numerical Random Variables
2.2.1 Discrete Random Variables
Remember: A discrete rv is one whose possible values are isolated points along the number
line.
Definition:
The probability distribution for a discrete rv X is a formula or table that gives the possible
values of X, and the probability p(X) associated with each value of X.
Value of X x1 x2 x3 ··· xn
Probability p1 p2 p3 ··· pn
The probabilities must satisfy two requirements:
• Every probability pi is a number between 0 and 1.
•
P
pi = 1.
Example:
Toss two unbiased coins and let x equal the number of heads observed.
The simple events of this experiment are:
coin1 coin 2 X p(simple event)

H H 2 1/4
H T 1 1/4
T H 1 1/4
T T 0 1/4
So that we get the following distribution for x=number of heads observed:
X p(X)
0 1/4
1 1/2
2 1/4
With the help of this distribution we can calculate that P (x ≤ 1) = P (x = 0) + P (x = 1) =

1/4 + 1/2 = 3/4.
Properties of discrete probability distributions:
• 0 ≤ p(X) ≤ 1
•
P
x p(X) = 1
The expected value or population mean µ (mu) of a rv X is the value that you would expect
to observe on average if the experiment is repeated over and over again. It is the center of the
distribution.
Definition:
14
Let X be a discrete rv with probability distribution p(X). The population mean µ or expected
value of X is given as X
µ = E(X) = Xp(X).
x
Example:
The expected value of the distribution of X=the number of heads observed tossing two coins
is calculated by
1 1 1
µ=0· +1· +2· =1
4 2 4
Definition:
Let X be a discrete rv with probability distribution p(X). The population variance σ 2 of X is
σ 2 = E((X − µ)2 ) = (X − µ)2 p(X).
X
The population standard deviation σ (sigma) of a rv X is equal to the square root of its variance.
√
σ = σ2
Example (continued):
The population variance of X=number of heads observed tossing two coins is calculated by
1 1 1 1 1 1
+ (1 − 1)2 · + (2 − 1)2 · = + =
σ 2 = (0 − 1)2 ·
4 2 4 4 4 2
and the population standard deviation is:
√ 1
σ = σ2 = √ .
2
Example 9
Consider the following discrete distribution for X= # of elections voted in
X P(X=x)
0 0.2
1 0.5
2 0.2
3 0.1
4 0
Then X
µ= xP (X = x) = 0(0.2) + 1(0.5) + 2(0.2) + 3(0.1) + 4(0) = 1.2
σ2 = (x − µ)2 P (X = x)
P
= (0 − 1.2)2 (0.2) + (1 − 1.2)2 (0.5) + (2 − 1.2)2 (0.2) + (3 − 1.2)2 (0.1) + (4 − 1.2)2 (0)
= 0.288 + 0.02 + 0.128 + 0.324 + 0
= 0.76
√
σ = 0.76 = 0.87
We conclude the mean number of elections participated in is 1.2 with a standard deviation of
0.87. Indicating that the measurement on average fall about 0,87 away from the mean. There
was not a lot of spread in the data.
15
2.2.2 Binomial Probability Distribution
Examples of discrete random variables can be found in a variety of everyday situations and
across most academic disciplines. Here we will discuss one discrete probability distribution
that serves as a model in lot of situations.
Many practical experiments result in data with only two possible outcomes
• patient survived 5 years (yes/no)
• coin toss (Head/Tail)
• poll: in favor of presidential decision (yes/no)
• person possesses a gene linked to Alzheimer (yes/no)
Each sampled person is the equivalent to tossing a coin, only that the probability for the event
of interest does not have to be equal to 1/2.
Definition: A binomial experiment has the following four characteristics:
1. The experiment consists of n identical independent trials.
2. Each trial results in one of two possible outcomes (success S, failure F ).
3. The probability of success on a single trial is equal for all trials, p. The probability of
failure is then equal to q = 1 − p.
4. We are interested in X, the number of successes observed during n trials.
Before we discussed the example to toss a fair coin twice and determined the probability dis-
tribution for X =number of heads, this is a binomial distribution with n = 2 and p = 0.5.
n, the number of trials, and p, the probability for Success, are the parameters of a binomial
distribution:
The Binomial Probability Distribution

A binomial experiment consists of n identical trials with probability for success p for each trial.
The probability for k successes in n trials is then
P (X = k) = n Ck pk (1 − p)n−k
for values of k = 0, 1, 2, . . . , n, where

n!
n Ck = , and n! = n(n − 1)(n − 2) · · · 2 · 1
k!(n − k)!
Theorem:
Suppose X is a binomial distributed rv, with n trials, and probability of success p. The
population mean of X is
µ = E(X) = np
The population variance of X is
σ 2 = npq = np(1 − p)
16
The population standard deviation of x is
√ q
σ= npq = np(1 − p)
Example: Assume an experiment with n = 10 and p = 0.1.
1. The mean µ = 10(0.1) = 1
2. The variance σ 2 = 10(0.1)(1 − 0.1) = 0.9

√
3. The standard deviation σ = 0.9 = 0.95
10! 10·9
4. The probability of 2 successes P (X = 2) = 10 C2 · 0.12 · 0.98 = 2! 8!
· 0.12 · 0.98 = 2
0.01 ·
0.43 = 0.1937
5. In order to give the complete probability distribution you would have to calculate P (X =
k) for k = 0, 1, 2 . . . , 10
A probability histogram would look like this
Example:
It is known that a given marksman can hit a target on a single trial with probability equal to
0.8. Suppose he fires 4 shots at the target:
1. What is the probability of hitting at most two targets. Find for n = 4 and p = 0.8 the
probability.
P (X ≤ 2) = P (X = 0)+P (X = 1)+P (X = 2) = 4 C0 0.24 +4 C1 0.81 0.23 +4 C2 0.82 0.22 = 0.1808
17
2. What is the probability to exactly hit three targets?
P (X = 3) = 4 C3 0.83 0.21 = 0.4096
3. What is the probability to hit at least 1 target?
P (X ≥ 1) = 1 − P (X < 1) = 1 − P (X = 0) = 1 − 4 C0 0.80 0.24 = 0.9984
4. What is the number of targets we should exppect the marksman to hit in 4 trials? µ =
np = 4(0.8) = 3.2 We should expect them to hit about three targets.
5. What is the standard deviation in the number of target being hit, when the marksman
does several rounds of 4 attempts?
q q
σ= np(1 − p) = 4(0.8)(0.2) = 0.8
The standard deviation is 0.8.
2.2.3 Continuous Random Variables

Continuous data variables are described by histograms. For histograms the measurement scale is
divided in class intervals and the area of the rectangles put above those intervals is proportional
to the relative frequency of the data falling into this interval.
The relative frequency can be interpreted as an estimate of the probability for falling in the
associated interval.
With this interpretation the histogram becomes an ”estimates” of the probability distribution
of the continuous random variable.
Definition:
The probability distribution of a continuous random variable X is described by a density curve.
The probability to fall within a certain interval is then given by the area under the curve above
that interval.
1. The total area under a density curve is always equal to 1.
2. The area under the curve and above any particular interval equals the probability of
observing a value in the corresponding interval when an individual or object is selected
at random from the population.
18
We can calculate that the probability for falling in the interval [−2; 0] equals 0.4772.
Example:
The density of a uniform distribution in an interval [0; 5] looks like this:
Use the density function to calculate probabilities for a random variable X with a uniform
distribution on [0; 5]:
19
• P (X ≤ 3) = area under the curve from − ∞ to 3 = 3 · 0.2 = 0.6
• P (1 ≤ X ≤ 2) = area under the curve from 1 to 2 = 1 · 0.2 = 0.2
• P (X > 3.5) = area under the curve from 3.5 to ∞ = 1.5 · 0.2 = 0.3
Remark: Since there is zero area under the curve above a single value, the definition implies
for continuous random variables and numbers a and b:
• P (X = a) = 0
• P (X ≤ a) = P (X < a)
• P (X ≥ b) = P (x > b)
• P (a < X < b) = P (a ≤ X ≤ b)
This is generally not true for discrete random variables.
How to choose a model for a given variable of a sample?

The model (density function) should resemble the histogram for the given variable.
Fortunately, many continuous data variables have bell shaped histograms. The normal proba-
bility distribution provides a good model for modelling this type of data.
2.2.4 Normal Probability Distribution

The density function of a normal distribution is unimodal, mound shaped, and symmetric.
There are many different normal distributions, they are distinguished from one another by their
population mean µ and their population standard deviation σ.
The density function of a normal distribution is given by

1 2 2
f (x) = √ e−(x−µ) /(2σ ) −∞≤x≤∞
σ 2π
with e ≈ 2.7183 and π ≈ 3.1416. µ and σ are called the parameters of a normal distribution.
And a normal distribution with mean µ and standard deviation σ is denoted as N (µ, σ).
20
µ is the center of the distribution, right at the highest point of the density distribution function.
At the values µ − σ and µ + σ the density curve has turning points. Coming from −∞ the
curve turns from a left to a right curve at µ − σ and again into in a left curve at µ + σ.
If the normal distribution is used as a model for a specific situation, the mean and the standard
deviation have to be chosen for that situation. E.g. the height of students at a college follow a
normal distribution with µ = 178 cm and σ = 10 cm.
21
The normal distribution is one example for a quantitative continuous distribution!
Definition:
The normal distribution with µ = 0 and σ = 1 is called the Standard Normal Distribution,
N (0, 1).
In order to work with the normal distribution, we need to be able to calculate the following:
1. We must be able to use the normal distribution to compute probabilities, which are areas
under the normal curve.
2. We must be able to describe extreme values in the distribution, such as the largest 5%,
the smallest 1%, the most extreme 10% (which would include the largest 5% and the
smallest 5%.
We first look how to compute these for a Standard Normal Distribution.
Since the normal distribution is a continuous distribution the following holds for every normal
distributed random variable X:
P (X < z) = P (X ≤ z) area under the curve from −∞ to z.
The area under the curve of a normal distribute random variable is very hard to calculate.
There is no simple formula that can be used to calculate the area.
Table II in the appendix (in the text book) tabulates for many different values of z ∗ the area
under the curve from −∞ to z ∗ , which is called the cumulative area, for standard normal
distributed random variables. These values are called cumulative density function.
From now on use Z to indicate a standard normal distributed random variable(µ = 0 and
σ = 1). Using the table you find that,
22
• P (Z < 1.75) = 0.9599 and
• P (Z > 1.75) = 1 − P (Z ≤ 1.75) = 1 − 0.9599 = 0.0401 .
• P (−1 < Z < 1) = P (Z < 1) − P (Z ≤ −1) = (.8413) − (0.1587) = .6826. (compare with
the Empirical Rule.)
The shaded area equals 0.6826.
The first probability can be interpreted as meaning that, in a long sequence of observations
from a Standard Normal distribution, about 95.99% of the observed values will fall below 1.75.
Try this for different values!
Now we will look how to identify extreme values.
Definition:
For any particular number r between 0 and 1, the rth percentile zr of a distribution is a value
such that the cumulative area for zr is r.
23
If X is a random variable the rth percentile zr is given by:
P (X ≤ zr ) = r
To determine the percentiles for a standard normal distribution, we can use Table II in again.
• Suppose we want to describe the values that make up the smallest 2%. So we are looking
for the 0.02th percentile z0.02 , with
P (Z ≤ z0.02 ) = 0.02.
So look in the body of the Table II for the cumulative area 0.0200. The closest you will
find 0.0202 for zr = −2.05 This is the best approximation you can find from the table.
The result is that the smallest 2% of the values of a standard normal distribution fall into
the interval (−∞, −2.05].
• Suppose now we are interested in the largest 5%. So we are looking for z ∗ , with
P (Z > z ∗ ) = 0.05
In Table II we can only find areas to the left of a given value, the first step is to determine
the area to the left of z ∗ :
P (Z ≤ z ∗ ) = 1 − 0.05 = 0.95
That tells us that in fact z ∗ = z0.95 the 0.95th percentile.
Checking the Table II we find values 0.9495 and 0.9505, with 0.95 exactly in the middle,
so we take the average of the corresponding numbers and get
1.64 + 1.65
z0.95 = = 1.645
2
24
• And now we are interested in the most extreme 5%. That means we are interested in
the middle 95%. Since the normal distribution is the symmetric the most extreme 5%
can be split up in the lower 2.5% and the upper 2.5%. Symmetry about 0 implies that
−z0.025 = z0.975 .
In Table II we find z0.025 = −1.96, so that z0.975 = 1.96

We found the result, that the 5% most extreme values can be found outside the interval
[−1.96, 1.96].
Until here we covered how to find probabilities and percentiles for the standard normal distri-
bution. It remains to show how to find these for any normal distribution, hopefully using the
results for the standard normal distribution.
Lemma: Is X normal distributed with population mean µ and population standard deviation
σ then the standardized random variable
X −µ
Z= is normal distributed with µ = 0 and σ = 1
σ
or z ∼ N (0, 1).
The following example illustrates how the probability and the percentiles can be calculated by
using the standardization process from the Lemma.
Example: Let X be normal distributed with µ = 100 and σ = 5, x ∼ N (100, 5).

1. Calculate the area under the curve between 98 and 107 for the distribution chosen above.
P (98 < X < 107) = P ( 98−100
5
< X−100
5
< 107−100
5
)
2 7
= P (− 5 < Z < 5 )
= P (−0.4 < Z < 1.4)
This can be calculated using Table II. P (−0.4 < Z < 1.4) = P (Z < 1.4) − P (Z <
−0.4) = (0.9192) − (0.3446) = 0.5746.
25
2. To find the 0.03th percentile for this distribution, that is x0.03 , use
0.03 = P (X ≤ x0.03 )
= P ( X−100
5
≤ X0.035−100 )
= P (z ≤ X0.035−100 )
But then x0.035−100 equals the 0.03th percentile from a standard normal distribution, which
we can find in Table II (look up 0.47, and take the negative).
x0.03 − 100
= −1.88
5
This is equivalent to x0.03 = −1.88 · 5 + 100 = 100 − 9.40 = 90.6.
So that the lower 3% of a normal distributed random variable with mean µ = 100 and
σ = 5 fall into the interval (−∞, 90.6]
Example:
Assume that the length of a human pregnancy follows a normal distribution with mean 266
and standard deviation 16.
What is the probability that a human pregnancy lasts longer than 280 days?
P (X > 280) = P ( X−266

16
> 280−266
16
) standardize
14
= P (Z > 16 )
= 1 − P (Z ≤ 0.875)
= 1 − 0.8078
= 0.1922
How long do the 10% shortest pregnancies last?

(Draw a picture.)
P (X ≤ x0.1 ) = 0.1
−266
P ( X−266
16
≤ x0.116 ) = 0.1 standardize
x0.1 −266
P (Z ≤ 16 ) = 0.1
x0.1 −266
So 16
is the 0.1 percentile of the standard normal distribution, z0.1 .
x0.1 −266
16
= z0.1 = −1.28 from Table II
This is equivalent to
x0.1 = 16(−1.28) + 266 = 245.5days
The 10% shortest pregnancies last shorter than 245.5 days.
2.3 Descriptive Methods for Assessing Normality

Process of assessing Normality
1. Construct a histogram or stem-and-leaf plot and note the shape of the graph. If the
graph resembles a normal (bell- shaped) curve, the data could be normally distributed.
Big deviations from a normal curve will make us decide that the data is not normal.
26
2. Compute the intervals x̄±s, x̄±2·s, and x̄±3·s. If the empirical rule holds approximately,
the data might be normally distributed. If big deviations occur, the assumption might
be unreasonable. 3. Construct a normal probability plot (normal Q-Q Plot) for the data.
If the data are approximately normal, the points will fall (approximately) on a straight
line. Construction of normal probability plot:
1. Sort the data from smallest to largest and assign the rank, I (smallest gets 1, second
2, etc.)
2. Find the normal scores for the relative rank (Table III).
3. Construct a scatter plot of the measurements and and the expected normal scores.
Example 10
Assume we have the data set 1, 2, 2.5, 3, 9

This is not enough data to do a histogram, so that method does not work. Let’s do a normal
probability plot to investigate.
x z-score
1 -1.18
2 -0.50
2.5 0
3 0.50
9 1.18
The first 4 points seem to fall on a line, but the fifth does not fit and seems to be an outlier,
or uncommon value. The data do not seem to come from a normal distribution. Outliers are
highly unlikely in normal distributions.
27
3 Sampling Distributions
In this chapter we will be developing the mathematical tools which will allow us to conduct
inferential statistics. We will see the effect of the distribution in a population on the values
observed in a sample. This knowledge than will then allow us to draw conclusions about a
population from a sample.
For example: An investigator might be able to determine the type (shape) of a distribution to
use as a model for a population, but the values of the parameters (mean and standard deviation
or the probability for ”success”) that specify its exact form are unknown.
• A pollster is sure that the responses to his ”agree/disagree” questions will follow a bino-
mial distribution, but p the probability of those who ”agree” is unknown.
• An agronomist believes that the yield per acre of a variety of wheat is approximately
normally distributed, but the mean µ and the standard deviation σ of the yields are
unknown.
In these cases the investigator has to rely on information of a sample in order to draw conclusions
about those parameters. If a sample is supposed to provide reliable information about the
population, it has to be selected randomly.
3.1 Statistics and Sampling Distributions

When you select a random sample the numerical descriptive measures you calculate (like sample
mean and sample standard deviation) are called statistics. These statistics vary or change for
each different random sample you select; that is they are random variables.
Definition:
Any quantity computed from values in a sample is called a statistic.
The value of a statistic varies from sample to sample this is called sample variability. Since the
sampling is done randomly, the value of an statistic is random. In conclusion:
statistics are random variables.
Since statistics are random variables, they have a distribution, that tells which values occur
with which probability.
Definition:
The distribution of a statistic is called a sampling distribution.
It provides the following information:
28
• the values of the statistic can occur
• the probabilities for each value to occur
Example:
The population is this section of Stat 151. Let µ be the population mean of the height in this
population.
Select a random sample of size 5 and observe the height.
For every random sample the sample mean x̄ is different, this is called the sampling variability.
Now suppose you look at every possible random sample of 5 students from this class and the
corresponding sample mean. For these numbers you can create the sampling distribution.
You will find that
1. The value of x̄ differs from one random sample to another (sampling variability).
2. Some samples produce x̄ values larger than µ, whereas other produce x̄ smaller than µ.
3. They can be fairly close to the mean µ, or also quite far off the population mean µ.
The sampling distribution of x̄ provides important information about the behavior of the statis-
tic x̄ and how it relates to the population mean µ.
Considering how many different samples of size five (for 60 students it is 60 C5 ) there are in this
class this process is very cumbersome. Fortunately, there are mathematical theorems that help
us to obtain information about the sampling distributions.
3.2 The Sampling Distribution of a Sample Mean

x̄ based on a large sample tends to be closer to µ than does x̄ based on a small n. This can be
explained by the following theoretical results.
Lemma: Suppose X1 , . . . , Xn are random variables with the same distribution with mean µ
and population standard deviation σ.
Now look at the random variable X̄.
1. The population mean of X̄ = (the mean of all sample means), denoted µX̄ , is equal to µ.
2. The population standard deviation of X̄(the standard deviation of all sample means),
denoted σX̄ , is
σ
σX̄ = √
n
This means that the sampling distribution of X̄ is always centered at µ and the second statement
gives the rate the spread of the sampling distribution (sampling variability) decreases as n
increases.
Definition:
The standard deviation of a statistic is called the standard error of the statistic (abbreviated
SE).
29
The standard error gives the precision of statistic for estimating a population parameter. The
smaller the standard error, the higher the precision.
√
The standard error of the mean X̄ is SE(X̄) = σ/ n.
Now that we learned about the mean and the standard deviation of the sampling distribution
of a mean, we might ask, if there is anything we can tell about the shape of the density curve
of this distribution.
3.2.1 Central Limit Theorem

This section explains why the normal distribution is so important in statistics.
The result is surprising. The Central Limit Theorem states, that under rather general condi-
tions, means of random samples drawn from one population tend to be approximately normally
distributed.
We find that it does not matter which kind of distribution we find in the population. It even
can be discrete or extremely skewed. But if n is large enough the distribution of the mean is
approximately normal distributed.
That is under all the possible distributions we find one family of distributions, that describes
approximately the distribution of a sample mean, if only n is large enough.
Central Limit Theorem

If random samples of n observations are drawn from any population with finite mean µ and
standard deviation σ, then, when n is large, the sampling distribution of
√ the mean X̄ is ap-
proximately normal distributed, with mean µ and standard deviation σ/ n.
Remarks:
• If the population itself is normal X̄ is normal distributed for all n, so n does not have to
be large.
• When the sampled population has a symmetric distribution, the sampling distribution of
X̄ becomes quickly normal. Compare the example below for n = 3.
• If the distribution is skewed, usually for n = 30 the sampling distribution is already close
to a normal distribution.
30
Example:
Consider tossing n unbiased dice and recording the average number of the upper faces.
The graphs display the sampling distribution for X̄ for n = 1, 2, 3, 4.
31
Looking at only n = 4 dice leads to a distributions that is very close to a normal distribution.
Summary: Assume that the measurements in a population follow all the same distribution
with finite mean µ and standard deviation σ. Then
• The mean of the sampling distribution of the mean X̄ of n observations holds
µx̄ = µ
• The standard deviation of the sampling distribution of the mean X̄ of n observations

holds
σ
σX̄ = √
n
• The sampling distribution of the mean X̄ of n observations is approximately normal

distributed, if n is large enough.
Example:
The duration of Alzheimer’s disease from the onset of symptoms until death ranges from 3 to
20 years. The mean is 8 years and the standard deviation is 4 years.
1. What is the probability that a randomly chosen Alzheimer’s patient survives less than 7
years?(hint: this is about a single patient, standardize using the parameters describing
the population)
Let X be the survival time of a randomly chosen patient, then
X −µ 7−µ

P (X < 7) = P < standardize
σ σ
7−8
= P z<
4
= P (z < −0.25)
= 0.4013 NormalTable
The probability that a randomly chosen patient will live less than 7 years is about 40%.
This result does not have to be accurate because we used the normal distribution to
calculate the probability but survival times do not tend to be normally distributed.
32
2. Now looking at the average survival time of 30 randomly selected Alzheimer’s patients:
What is the probability that the average survival time of 30 patients is less than 7 years?
!
X̄ − µX̄ 7 − µX̄
P (X̄ < 7) = P < standardize
σx̄ σ
! x̄
7−8
= P z< √
4/ 30
= P (z < −1.37)
= 0.0853 NormalTable
This result is believable since the sample mean can be assumed to be normally distributed
because of the large sample size (≥ 30).
If the sample size is large, regardless of the distribution in the population, sample means
tend to be normally distributed.
33

3 Probability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 Probability

Uploaded by

Copyright:

Available Formats

Contents

2. An event is a subset of the sample space.

• Measure daily snowfall

• Count leucocytes in a blood sample

• All numbers between 0 and 200 cm.

1.2 Basic Rules

Example: Continue the die rolling example.

P (A oder B) = P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Rule for Complements

With the Rule for Complements is P (Ac ) = 1 − 1/3 = 2/3.

1.3 Independence of Two Events

The interpretation of the conditional probability P (A|B) is as follows:

(which is equivalent to P (A ∩ B) = P (A) · P (B)).

Since P (C) = P (C|A), we find that C and A are independent.

• ELISA is a test for detecting HIV antibodies in blood samples.

• The test is positive in 99.7% of blood samples containing antibodies.

• The test is positive in 1.5% of blood samples containing no antibodies.

• Then is H− = H+C the event of a blood sample not containing antibodies.

• Let T+ be the event of a positive ELISA test result.

• Then is T− = T+C the event of a negative ELISA test result.

Now “translate” the facts stated above into conditional probabilities.

Calculate using the definition:

Now calculate nominator and denominator separately.

P (H+ ∩ T+ ) = P (H+ )P (T+ |H+ ) = 0.01 · 0.997 = 0.00997

= P (T+ ∩ H+ ) + P (T+ ∩ H− ) Addition Rule

= P (H+ )P (T+ |H+ ) + P (H− )P (T+ |H− ) definition of conditional probability

= 0.01 · 0.997 + (1 − 0.01) · 0.015 = 0.02482 Rule for Compliments

P (M ∩ B) = P (M ) · P (B) = 0.05 · 0.08 = 0.004

P (O) = 1 − P (Oc ) = 1 − 0.004 = 0.996

Solution (first try on your own):

1.4.1 Combinations – order does NOT matter

where N ! = (1)(2)(3) · · · (N − 2)(N − 1)N (similar for n!)

60! 58!(59)(60) 3540

49! 41!(42 ∗ 43 ∗ 44 ∗ 45 ∗ 46 ∗ 47 ∗ 48 ∗ 49) 1.8183436 ∗ 1013

and p = 3.8496575 ∗ 10−12 = 0.00000000000384

1.4.2 Permutations – order matters

2.1 Categorical Random Variables

The probability distribution is:

• Every probability pi is a number between 0 and 1.

coin1 coin 2 X p(simple event)

So that we get the following distribution for x=number of heads observed:

With the help of this distribution we can calculate that P (x ≤ 1) = P (x = 0) + P (x = 1) =

Properties of discrete probability distributions:

• patient survived 5 years (yes/no)

• coin toss (Head/Tail)

• poll: in favor of presidential decision (yes/no)

• person possesses a gene linked to Alzheimer (yes/no)

Definition: A binomial experiment has the following four characteristics:

1. The experiment consists of n identical independent trials.

2. Each trial results in one of two possible outcomes (success S, failure F ).

4. We are interested in X, the number of successes observed during n trials.

The Binomial Probability Distribution

for values of k = 0, 1, 2, . . . , n, where

Example: Assume an experiment with n = 10 and p = 0.1.

1. The mean µ = 10(0.1) = 1

2. The variance σ 2 = 10(0.1)(1 − 0.1) = 0.9

P (X ≤ 2) = P (X = 0)+P (X = 1)+P (X = 2) = 4 C0 0.24 +4 C1 0.81 0.23 +4 C2 0.82 0.22 = 0.1808

P (X = 3) = 4 C3 0.83 0.21 = 0.4096

3. What is the probability to hit at least 1 target?

P (X ≥ 1) = 1 − P (X < 1) = 1 − P (X = 0) = 1 − 4 C0 0.80 0.24 = 0.9984

2.2.3 Continuous Random Variables

1. The total area under a density curve is always equal to 1.

• P (1 ≤ X ≤ 2) = area under the curve from 1 to 2 = 1 · 0.2 = 0.2