You are on page 1of 129

Faculty of Applied Sciences

Department of Mathematics & Physics

Statistics 1B Lecture Notes

Author: T. Farrar

Name:
Faculty of Applied Sciences
Department of Mathematics and Physics

Statistics 1B Lecture Notes


Author: T. Farrar
2019

Contents
1 Introduction 1

2 Sampling Distributions 3

3 Introduction to Estimation and Confidence Intervals 18

4 Introduction to Hypothesis Testing 45

5 Hypothesis Tests about One Population 64

6 Hypothesis Tests for Comparing Two Populations 69

7 Pearson Chi-Squared Tests for Categorical Data 86

8 Introduction to Nonparametric Methods 100

9 Single-Factor Analysis of Variance 110

1 Introduction
Textbooks

• The following books were used in preparing this module: (Wackerly et al.,
2002), (Navidi, 2006), (Wonnacott and Wonnacott, 1990), (Moore, 2000),
(Keller, 2012)

What is Statistics 1B all about?

• In Statistics 1A you learned about graphical methods, descriptive statistics,


probability principles, probability distributions, and lastly sampling distribu-
tions

1
• In this module, the main focus is on confidence intervals and hypothesis testing,
although we won’t get into these topics right away
• Behind these methods is the basic need to make decisions based on data
• Suppose we have matric learners who are doing two different after-school maths
study programmes. The first group scores an average mark of 53% on their
exam and the second group scores an average of 56%. Do we have enough
evidence to conclude that the second after-school maths study programme is
more effective than the first? Or is this just random variation, and if we were to
repeat the programmes next year, maybe the first programme would produce
better results than the second? The descriptive statistics on their own are not
enough to answer this question. We need hypothesis testing.
• When doing research in almost any discipline, from medicine to ecology to
marketing to finance to chemistry to behavioural science, hypothesis testing is
the standard means of demonstrating a claim to be true. Hence, a person who
understands hypothesis testing understands how to produce knowledge in the
modern world
What is a statistician?
• There is a difference between ‘doing statistics’ and ‘being a statistician’
• Doing statistics is challenging on its own; you must learn many different meth-
ods and formulas
• Being a statistician means you must be able to see the big picture and answer
questions like:
◦ How can a real-world problem be expressed as a quantitative research
question?
◦ Where and how can we obtain data that will enable us to answer this
research question?
◦ What statistical method(s) would be most appropriate for answering this
research question with this data?
◦ What are the assumptions that must hold in order for this statistical
method to be valid? Do these assumptions hold?
◦ How should the statistical results be expressed scientifically? (It is im-
portant not to state results as facts but always recognize the uncertainties
that exist.)
◦ How should the statistical findings be communicated to an audience that
is not familiar with statistical methods?
• In addition to these, the statistician must think about issues such as ethics.
Statistics can be used to produce knowledge and demonstrate truth, at least
to a high degree of probability. But statistics can also be used to deceive and
mislead. A good statistician uses statistics only for honest purposes and has a
responsibility to point out when statistics are being used dishonestly by others.

2
• In short, a statistician is a problem-solver. She or he certainly doesn’t know
everything, but knows how to acquire knowledge using data

2 Sampling Distributions
The first part of this section is a revision of material already covered in Statistics
1A.

2.1 Background to Sampling Distributions


Statistics as Random Variables
• In Statistics 1A, you learned about descriptive statistics and you also learned
about random variables and probability distributions
• Let us remind ourselves of a couple of definitions:
• A statistic is a quantity calculated from a sample in order to estimate an
unknown parameter from a population
◦ The population could be finite or infinite
• A random variable is a rule which assigns a value to each outcome of an
experiment. It has error or uncertainty; its value cannot be known for certain
until the experiment takes place
• A random variable is usually described in terms of its probability distribu-
tion, which tells us how likely its various outcomes are
• In frequentist statistics (the branch of statistics we are doing), it is assumed
that population parameters are fixed, not random, although they are unknown
• (In another branch of statistics called Bayesian statistics, this assumption is
not made, but we will not be doing Bayesian statistics in this module!)
• Consider this simple example: we are going to flip a coin n times, because we
want to know whether the coin is fair, i.e. whether the probability of ‘Heads’
is equal to the probability of ‘Tails’ (both 0.5)
◦ In this case, there is (in theory) an infinite population of coin flips in the
universe, from which we are going to take a sample of size n (you can see
that a ‘population’ is not always clearly defined)
◦ The parameter is p, the probability of getting ‘Heads’
◦ The random variable is Y , the number of times the coin comes up
‘Heads’ in our sample of n flips
Y
◦ The statistic is p̂ = , the proportion of ‘Heads’ in our sample of n
n
flips. This will be our estimator of p (when we put aˆon a parameter it
denotes a statistic that is an estimator of that parameter).

3
◦ The probability distribution that defines Y is the binomial distribu-
tion, because what we have described is a binomial experiment
• Now here is the big new insight: a statistic is also a random variable!
• This sounds strange at first: after all, once we have flipped the coin n times
we know the value of the statistic p̂, so how is it random? But remember,
that is true of any random variables: we know the actual outcome after the
experiment has been done. But not before! Before we flip any coins, we don’t
know the value of p̂ and various outcomes are possible; hence it is random
• In fact, it is easy to see mathematically that p̂ is a random variable, because
Y
its formula, , shows that it is a function of a random variable, Y . And any
n
function of a random variable must also be a random variable.
• Hence, a statistic is a random variable. Indeed, another way to define a statistic
is this:
◦ A statistic is a function of the observable random variables in a sample
and known constants.
◦ For example, p̂ is a function of a random variable Y and a known constant
n
Sampling Distributions
• Every random variable behaves according to a probability distribution
• Hence, because a statistic is a random variable, a statistic has its own proba-
bility distribution
• The probability distribution of a statistic drawn from a sample is called a
sampling distribution
• The focus of this chapter is on describing the sampling distribution of two
commonly used statistics: the sample mean and the sample proportion

2.2 Sampling Distribution for a Sample Mean or Sample


Proportion
The Sampling Distribution of the Sample Mean of Normally Distributed
Random Variables
• The probability distribution of the mean of a sample is defined in the following
theorem:
Theorem 1. Let Y1 , Y2 , . . . , Yn be a sample of independent random variables
n
2 1X
from a normal distribution with mean µ and variance σ . Then Ȳ = Yi ,
n i=1
the sample mean statistic, is normally distributed with mean µȲ = µ and
σ2
variance σȲ2 = .
n

4
• If you are interested in seeing a proof of this theorem, see (Wackerly et al.,
2002), pages 331-332.
Ȳ − µȲ Ȳ − µ
• It also follows from this theorem that Z = = √ has a standard
σȲ σ/ n
normal distribution.

An Illustration of the Sampling Distribution of a Sample Mean

• Let us revisit the human pregnancy example. Let us assume that the length
of a human pregnancy is normally distributed with a mean of 266 days and a
standard deviation of 16 days. But suppose we don’t know this mean, and we
want to estimate it by collecting data from a random sample of mothers

• Let us consider four possible sample sizes: n = 1, n = 5, n = 10, and n = 50.



• According to sampling distribution theory, in each case E Ȳ = µ = 266;
that is, the mean of the sampling distribution of Ȳ equals the mean of the
probability distribution of Yi
 σ2
• However, because Var Ȳ = , the variance of the sampling distribution of
n
Ȳ decreases as the sample size increases

• This makes sense: if we collect more data, we would expect to have a more
precise estimate of the average length of a pregnancy

• The effect of increasing sample size on the sampling distribution is shown in


the graph:

• What we can see is that if we were to take a sample of just one mother, and
another researcher were to do the same, and a third researcher were to do the
same, and so forth, then when we all compared our results they would be very
spread out: they would have a large variance. One researcher might estimate
the average pregnancy length to be 240 days, and another, 290 days

• However, if we were to take a sample of 50 mothers, and another researcher


were to do the same, and a third researcher, and so forth, then when we all
compared our results they would be much closer together: they would have
a small variance. In fact, almost certainly all the researchers would have
obtained a sample mean somewhere between 256 and 276

Sampling Distribution of a Sample Mean: Example Problem 1

• The amount of time university lecturers devote to their jobs per week is nor-
mally distributed with a mean of 52 hours and a standard deviation of 6 hours.
It is assumed that all lecturers behave independently.

5
1. What is the probability that a lecturer works for more than 60 hours per
week?
Let Y1 be the number of hours worked per week by this lecturer. (Note
that we could equivalently define Ȳ as the sample mean of this sample
of n = 1 observation; in this case we could use the sampling distribution
approach and get the same answer.)
 
Y1 − µ 60 − µ
Pr (Y1 > 60) = Pr >
σ σ
 
60 − 52
= Pr Z >
6
= Pr (Z > 1.33)
= 1 − Pr (Z < 1.33) = 1 − 0.9082 = 0.0918

2. What is the probability that the mean amount of work per week for four
randomly selected lecturers is more than 60 hours?

6
Let Y1 , Y2 , Y3 , Y4 be the number of hours worked per week by these four
respective lecturers. Then, according to the sampling distribution theo-
rem, Ȳ is a normally distributed random variable with a mean of µ = 52
σ 6
and a standard deviation of √ = √ = 3.
n 4
 
 Ȳ − µ 60 − µ
Pr Ȳ > 60 = Pr √ > √
σ/ n σ/ n
 
60 − 52
= Pr Z > √
6/ 4
= Pr (Z > 2.67)
= 1 − Pr (Z < 2.67) = 1 − 0.9962 = 0.0038

We can see that the probability is much smaller in this case. Does this
agree with the graph above in terms of the effect of increasing sample
size on the spread of the sampling distribution?
3. What is the probability that if four lecturers are randomly selected, all
four work for more than 60 hours?
Because we have assumed that all lecturers are independent, we can use
the multiplication rule for independent events, which says that Pr (A ∩ B) =
Pr (A) Pr (B) if events A and B are independent. In this case we have
four events: Y1 > 60, Y2 > 60, Y3 > 60 and Y4 > 60. Of course, the mul-
tiplication rule for independent events can be extended to any number of
independent events. Thus:
Pr (Y1 > 60 ∩ Y2 > 60 ∩ Y3 > 60 ∩ Y4 > 60)
= Pr (Y1 > 60) Pr (Y2 > 60) Pr (Y3 > 60) Pr (Y4 > 60)
= [Pr (Y1 > 60)]4 (since the four random variables are identically distributed)
= 0.09184 = 0.000071

Sampling Distribution of a Sample Mean: Example Problem 2


• The manufacturer of cans of tuna that are supposed to have a net weight of
200 grams tells you that the net weight is actually a normal random variable
with a mean of 201.9 grams and a standard deviation of 5.8 grams. Suppose
you draw a random sample of 32 cans.
1. Find the probability that the mean weight of the sample is less than 199
grams.

 
 Ȳ − µ 199 − µ
Pr Ȳ < 199 = Pr √ < √
σ/ n σ/ n
 
199 − 201.9
= Pr Z < √
5.8/ 32
= Pr (Z < −2.83)
= 1 − Pr (Z < 2.83) = 1 − 0.9977 = 0.0023

7
2. Suppose your random sample of 32 cans of tuna produced a mean weight
that is less than 199 grams. Comment on the statement made by the
manufacturer.
If the distribution of the net weight stated by the manufacturer is true,
then we had only a 0.0023 probability of achieving a mean weight of less
than 199 grams in our sample. Since we did achieve such a weight, either
something extremely improbable has happened, or the manufacturer has
given us an incorrect probability distribution. The latter is more likely:
probably either the mean weight is below 201.9 grams or the standard
deviation is 5.8 grams.

Sampling Distribution of a Sample Mean: Example Problem 3 (Challeng-


ing)

• A teacher is taking some of her learners to an annual mathematics competition,


where competitors get a score from 0 to 100. The teacher knows from past
experience that her learners’ scores are normally distributed with a mean of
58 and a standard deviation of 13. The teacher wants to be 90% sure that
the mean score achieved by her learners this year is at least 50. What is the
minimum number of learners she should take to the competition?
To answer this question we must recognize that we will still be using the
sampling distribution of the mean; what has changed is that the unknown is
no longer the probability but the sample size, n.

 
 Ȳ − µ 50 − µ
Pr Ȳ > 50 = Pr √ > √ = 0.9
σ/ n σ/ n
 
50 − 58
Pr Z > √ = 0.9
13/ n
50 − 58
Let z = √
13/ n
Pr (Z > z) = 0.9
Pr (Z < −z) = 0.9
−z ≈ 1.28
z ≈ −1.28
50 − 58
√ ≈ −1.28
13/ n
√ −1.28(13)
n≈
√ −8
n ≈ 2.08
n ≈ 4.33

The minimum number of learners she should take is 4.33; but since she can
only take an integer number of learners, we must round up to 5. The teacher
should take at least 5 learners to the competition.

8
Sampling Distribution of a Sample Mean: Exercises

• An automatic machine in a manufacturing process is operating properly if


lengths of an important subcomponent are normally distributed with mean
117 cm and standard deviation 5.2 cm.

1. Find the probability that one selected subcomponent is shorter than 114
cm
2. Find the probability that if five subcomponents are randomly selected,
their mean length is less than 114 cm
3. Find the probability that if five subcomponents are randomly selected,
all five have a mean length of less than 114 cm

• (Challenging) The time it takes for a statistics lecturer to mark a test is nor-
mally distributed with a mean of 4.8 minutes and a standard deviation of 1.3
minutes. There are 60 students in the lecturer’s class. What is the probability
that he needs more than 5 hours to mark all the tests? (The 60 tests in this
year’s class can be considered a random sample of the many thousands of tests
the lecturer has marked and will mark.)

2.3 Central Limit Theorem


The Central Limit Theorem

• The Central Limit Theorem is a very important part of statistics theory. It is


stated as follows:

Theorem 2. Let Y1 , Y2 , . . . , Yn be independent and identically distributed (i.i.d.)


random variables with E (Yi ) = µ (a mean of µ) and Var (Yi ) = σ 2 < ∞ (a
n
2 Ȳ − µ 1X
finite variance of σ ). Let Un = √ where Ȳ = Yi . Then the distri-
σ/ n n i=1
bution function of Un converges to a standard normal distributionZ b function as
1 −u2 /2
n → ∞. Stated mathematically, limn→∞ Pr (a ≤ Un ≤ b) = √ e du
a 2π

• What does this theory mean in practice and why is it important?

• It means that for a sample of ‘i.i.d.’ random variables from any distribution,
not only the normal distribution, one can perform a simple transformation
on the sample mean to get an approximately standard normally distributed
random variable

An Illustration of the Central Limit Theorem

• Let us return to the idea of the normal approximation to the binomial distri-
bution in order to illustrate the central limit theorem in practice

9
• Suppose that Y1 , Y2 , . . . , Ym are independent binomially distributed random
variables each with p = 0.1 and n = 10.

• The following graphs show the simulated probability distribution of Um (as


defined in the Central Limit Theorem) for m = 1, 5, 10, 50, 100, 1000. It is
clear that the distribution is approaching that of a normal distribution as m
becomes large

• A rule of thumb is that the Central Limit Theorem may be used safely as long
as n > 30

10
Central Limit Theorem Example

• Example questions involving the CLT are similar to those from the sampling

11
distribution of the mean of a normally distributed sample. The main difference
is that the CLT gives only approximate probabilities whereas the sampling
distribution of the mean of a normally distributed sample is exact. So when
using the CLT we must always put ≈ instead of =.
• Example: The fracture strength of tempered glass averages 14 (measured in
thousands of pounds per square inch) and has standard deviation 2.
1. What is the probability that the average fracture strength of 100 ran-
domly selected pieces of this glass exceeds 14.5?
We do not know in this case that the fracture strength is normally dis-
tributed. Hence we need to use the Central Limit Theorem to say that
Ȳ − µ
U100 = √ approximately follows a standard normal distribution.
σ/ n
 
 Ȳ − µ 14.5 − µ
Pr Ȳ > 14.5 = Pr √ > √
σ/ n σ/ n
 
14.5 − 14
≈ Pr Z > √
2/ 100
= Pr (Z > 2.5)
= 1 − Pr (Z < 2.5) = 1 − 0.9938 = 0.0062
2. Find an approximate interval that includes, with probability 0.95, the
average fracture strength of 100 randomly selected pieces of this glass.
Ȳ − µ
Let Z = √ . Then:
σ/ n

Pr (−z < Z < z) = 0.95


Pr (Z < z) − Pr (Z < −z) = 0.95
Pr (Z < z) − (1 − Pr (Z < z)) = 0.95
2 Pr (Z < z) = 1.95
Pr (Z < z) = 0.975
z ≈ 1.96
Pr (−1.96 < Z < 1.96) ≈ 0.95
 
Ȳ − µ
Pr −1.96 < √ < 1.96 ≈ 0.95
σ/ n
 
σ σ
Pr −1.96 √ < Ȳ − µ < 1.96 √ ≈ 0.95
n n
 
σ σ
Pr µ − 1.96 √ < Ȳ < µ + 1.96 √ ≈ 0.95
n n
 
2 2
Pr 14 − 1.96 √ < Ȳ < 14 + 1.96 √ ≈ 0.95
100 100

Pr 13.608 < Ȳ < 14.392 ≈ 0.95

12
Hence, an approximate interval that includes, with probability 0.95, the
average fracture strength of 100 randomly selected pieces of glass is
(13.608, 14.392).

• Example: An anthropologist wishes to estimate the average height of men for


a certain ethnic group. If the population standard deviation is assumed to
be 5 cm and if she randomly samples 100 men, find the probability that the
difference between the sample mean and the true population mean will not
exceed 1 cm.
We notice that in this case we are not given Ȳ or µ but we are given the
difference between them, which is all we need
 
Pr |Ȳ − µ| < 1 = Pr −1 < Ȳ − µ < 1
 
−1 Ȳ − µ 1
= Pr √ < √ < √
σ/ n σ/ n σ/ n
 
−1 1
≈ Pr √ <Z< √
5/ 100 5/ 100
 
−1 1
= Pr √ <Z< √
5/ 100 5/ 100
= Pr (−2 < Z < 2)
= Pr (Z < 2) − Pr (Z < −2)
= Pr (Z < 2) − [1 − Pr (Z < 2)]
= 2 Pr (Z < 2) − 1
= 2(0.9772) − 1 = 0.9544

Central Limit Theorem Exercises

• Workers employed in a large service industry have an average wage of R70


per hour with a standard deviation of R5. The industry has 64 workers of a
certain ethnic group. These workers have an average wage of R69 per hour. Is
it reasonable to assume that the wage rate of this ethnic group is equivalent
to that of a random sample of workers from those employed in the service
industry? (Hint: Calculate the probability of obtaining a sample mean less
than or equal to R69 per hour).

• (Challenging) Suppose the anthropologist in the example above wants the


difference between the sample mean and the population mean to be less than
0.8 cm with probability 0.95. How many men should she sample to achieve
this objective?

Sampling Distribution of a Sample Proportion

• The theory behind sampling distribution of a sample proportion is very close


to that of the normal approximation to the binomial distribution, which we
have already covered

13
• A proportion is like a percent but it is expressed as a decimal between 0 and
1 rather than on a scale from 0 to 100

• Suppose we have a population with an infinite number of objects, all of which


are either ‘successes’ or ‘failures’. Let p be the proportion of successes in the
population.

• If we take a random sample of n independent objects from this population, this


is nothing other than a binomial experiment, so the number of successes Y in
the sample will be a binomially distributed random variable with probability
of success p in each trial and number of trials n

• Now we are interested in the sampling distribution of the sample proportion


Y
p̂ =
n
• We know that µ = E (Y ) = np and σ 2 = Var (Y ) = np(1 − p); how can we use
this information to get the sampling distribution of p̂?

Two Important Rules of Expectation and Variance

• When dealing with the expected value and variance of a random variable, the
following rules apply (which we will not take the trouble to prove, although it
is not too difficult)

• If Y is a random variable and a and b are constants, then:

◦ E (aY + b) = aE (Y ) + b
◦ Var (aY + b) = a2 Var (Y )
Y
• These rules help us in the case of p̂ because p̂ = is Y multiplied by a
n
constant, and we know the expected value and variance of Y from the binomial
distribution

• Hence, using the two rules above:


 
Y 1 1
µp̂ = E (p̂) = E = E (Y ) = (np) = p
n n n
 
Y 1 1 p(1 − p)
σp̂2 = Var (p̂) = Var = 2 Var (Y ) = 2 (np(1 − p)) =
n n n n

• Now that we know the expected value and variance of p̂, we can use the Central
Limit Theorem to derive the approximate sampling distribution of p̂
p̂ − µp̂ p̂ − p
• By the CLT, we have that = p approaches a standard
σp̂ p(1 − p)/n
normal distribution as n becomes large

14
• As with the normal approximation to the binomial distribution, this approxi-
mation is satisfactory as long as np ≥ 5 and n(1 − p) ≥ 5

• Note that we do not have to use a Continuity Correction here because p̂ is a


continuous random variable

Sampling Distribution of a Sample Proportion: Example 1

• An upcoming election features only two candidates, A and B. Assume we know


that candidate A will win with 52% of the votes. What is the probability that
in a poll of a random sample of 500 voters taken two days before the vote, less
than 50% say they will vote for candidate A (meaning the poll gives a false
prediction of the outcome)? You may assume that those sampled reveal their
true intentions in the poll and do not change their mind between the poll and
the election.
!
p̂ − p 0.5 − p
Pr (p̂ < 0.5) = Pr p <p
p(1 − p)/n p(1 − p)/n
!
0.5 − 0.52
≈ Pr Z < p
0.52(1 − 0.52)/500
= Pr (Z < −0.90)
= 1 − Pr (Z < 0.90)
= 1 − 0.8160 = 0.1840

Sampling Distribution of a Sample Proportion: Example 2

• A university bookstore claims that 50% of its customers are satisfied with the
service and prices.

1. Given that this claim is true, what is the probability that in a random
sample of 600 customers less than 45% are satisfied?
2. Suppose that a random sample of 600 customers is actually taken, and
270 customers say they are satisfied. What does this tell you about the
bookstore’s claim? Explain.

1. We use the sampling distribution of p̂ as follows:


!
p̂ − p 0.45 − p
Pr (p̂ < 0.45) = Pr p <p
p(1 − p)/n p(1 − p)/n
!
0.45 − 0.5
≈ Pr Z < p
0.5(1 − 0.5)/600
= Pr (Z < −2.45)
= 1 − Pr (Z < 2.45)
= 1 − 0.9929 = 0.0071

15
2. p̂ = 270/600 = 0.45. The percent of satisfied customers in the sample is
45%. We have shown that, if the true percent of satisfied customers in the
population were 50%, there is only a 0.71% chance of getting such a low
rate of satisfied customers in the sample. Because this is so unlikely to
have occurred, it seems better for us to assume that the bookstore’s claim
is false. We conclude that the percent of customers who are satisfied is
not 50%, but something lower.

Sampling Distribution of a Sample Proportion: Exercises

• A commercial for a manufacturer of household appliances claims that 3% of


all its products require a service call in the first year. A consumer protection
association wants to check the claim by surveying 400 households that recently
purchased one of the company’s appliances. They find that 5% of the 400
households report having required a service call in the first year. What is
the probability that more than 5% of households in the sample would have
required a service call in the first year if the commercial’s claim were true?
What do you conclude about the commercial’s claim?

• An accounting lecturer claims that no more than one-quarter of undergraduate


business students will major in accounting. Given that this claim is true, what
is the probability that in a random sample of 1200 undergraduate business
students, 336 or more will major in accounting?

2.4 Sampling Distribution for Difference in Sample Means


Sampling Distribution for the Difference between two Sample Means

• Suppose we have two normally distributed populations. Population 1 has a


mean of µ1 and a variance of σ12 and Population 2 has a mean of µ2 and a
variance of σ22 . and we draw an independent random sample from each. Let
X̄ be the mean of the sample from population 1 and let Ȳ be the mean of
the sample from population 2. What then is the sampling distribution of the
difference between the sample means, X̄ − Ȳ ?

• Using laws of expected value and variance it is possible to derive the expected
value and variance of X̄ − Ȳ :

µX̄−Ȳ = µ1 − µ2
2 σ12 σ22
σX̄− Ȳ = +
n1 n2

• It can also be proven mathematically that, under the conditions described


above, X̄ − Ȳ is a normally distributed random variable. Thus in this case we
do not need the Central Limit Theorem approximation; the sampling distri-
bution is exact

16
(X̄ − Ȳ ) − (µ1 − µ2 )
• That is, Z = s is a standard normal random variable
2 2
σ1 σ2
+
n1 n2

• If Populations 1 and 2 are not normally distributed, then this quantity will
still be approximately normally distributed for large sample sizes (n1 ≥ 30 and
n2 ≥ 30).

Sampling Distribution for the Difference between two Sample Means:


Example

• Suppose that starting salaries of teachers in Gauteng are normally distributed


with a mean of R12 500 per month and a standard deviation of R2 000, while
starting salaries of teachers in Western Cape are normally distributed with a
mean of R12 000 and a standard deviation of R1 500. (Note: these values
are made up!) If random samples of 50 new teachers from Gauteng and 60
new teachers from Western Cape are selected, what is the probability that
the sample mean starting salary of Western Cape teachers will exceed that of
Gauteng teachers?
We will consider Population 1 to be the starting salaries of Gauteng teachers
and Population 2 to be the starting salaries of Western Cape teachers.
 
Pr X̄ < Ȳ = Pr X̄ − Ȳ < 0
 
 
 (X̄ − Ȳ ) − (µ1 − µ2 ) 0 − (µ 1 − µ 2 ) 
= Pr 
 s < s 

2 2 2 2 
 σ1 σ2 σ1 σ2
+ +
n1 n2 n1 n2
 
 0 − (12500 − 12000) 
= Pr 
Z < r

2 2 
2000 1500
+
 50  60
−500
= Pr Z < √
117500
= Pr (Z < −1.46)
= 1 − Pr (Z < 1.46)
= 1 − 0.9279 = 0.0721

There is a 7.2% probability that the mean of teachers’ starting salaries in


the Western Cape sample will be higher than the mean of teachers’ starting
salaries in the Gauteng sample.

Sampling Distribution for the Difference between two Sample Means:


Exercise

17
• A factory worker’s productivity is normally distributed. One worker produces
an average of 75 units per day with a standard deviation of 20. Another worker
produces at an average rate of 65 units per day with a standard deviation of
21. What is the probability that during one week (5 working days), worker 1
will outproduce worker 2? (Hint: think of the five days as a random sample
from the population of all the days that the worker has worked and will work
at the factory.)

3 Introduction to Estimation and Confidence In-


tervals
3.1 Point and Interval Estimators
What is estimation?

• Estimation is a concept you have probably heard about since primary school

• In mathematics, estimation means something like ‘a rough calculation of the


value, number or quantity of something’

• However, in statistics, estimation takes on a more precise meaning:

• Estimation is an attempt to determine the approximate value of a


population parameter on the basis of a sample statistic

• The need for estimation arises precisely because the value of a population
parameter is almost always unknown

• For instance, consider the following quantities:

◦ The mean income of South Africans aged 15 to 65


◦ The proportion of South Africans above age 12 who smoke cigarettes

• It seems impossible to know any of these quantities exactly: how could we


possibly get the necessary information from all the millions of people in the
population?

• Hence, the best we can do is to estimate these population quantities using


statistics calculated on a sample drawn from the population

The two kinds of estimates

• An estimate is an actual estimation proposal

• For instance, based on research we might propose that the proportion of South
Africans above age 12 who smoke cigarettes is 0.15. In this case, ‘0.15’ is the
estimate. Or, we might propose based on our sample data that the mean
income of South Africans aged 15 to 65 is R1800 per month. In this case,
‘R1800 per month’ is the estimate.

18
• We can distinguish between two kinds of estimates: a point estimate and an
interval estimate

• A point estimate approximates an unknown population parameter using a


single value that is believed to be close to the parameter value

• An interval estimate approximates an unknown parameter using a range of


values within which the parameter value is believed to be

• Hence, if we are trying to estimate the proportion of South Africans above


age 12 who smoke cigarettes, one might say, ‘Based on my research I believe
that 15.2% of South Africans above age 12 smoke cigarettes’. This would be
a point estimate. Or, one might say, ‘Based on my research I believe that
between 10% and 20% of South Africans above age 12 smoke cigarettes’. This
would be an interval estimate.

• You are probably much more familiar with using point estimators. However,
there are two major advantages to using interval estimators:

1. It is virtually certain that a point estimator is wrong, whereas an interval


estimator might be right. That is, it is extremely unlikely that the percent
of South Africans above age 12 who smoke cigarettes is exactly 15.2% .
However, it is quite possible that the percent of South Africans above age
12 who smoke cigarettes is between 10% and 20%.
2. We often need an idea of how close the estimate is to the parameter. Only
an interval estimate provides such information. For instance, suppose
there are two researchers. One researcher samples 10 South Africans to
estimate the mean income of the population. The other researcher sam-
ples 10000 South Africans to estimate the mean income of the population.
Obviously the second sample should give us a much more reliable esti-
mate because the sample size is bigger; but it is only an interval estimate
that can tell us this (based on the width of the interval). If we only use
point estimates, we will just have a single value to work with from each
researcher, with no way of quantifying which one is more trustworthy.

• We can distinguish between the terms ‘estimator’ and ‘estimate’

• An estimator is a rule that tells how to calculate the value of an estimate


based on the data contained in a sample. An estimator is usually expressed
as a formula

• By contrast, an estimate is the actual numerical result once we have ‘plugged’


the numbers from the data into the formula and calculated the result

Examples of estimators

• If we have random variables Y1 , Y2 , . . . , Yn with unknown mean µ then Ȳ =


n
1X
Yi is an example of an estimator of µ
n i=1

19
Y
• A sample proportion p̂ = is an estimator of the population proportion p
n
n
2 1 X 2
• A sample variance S = Yi − Ȳ is an estimator of the population
n − 1 i=1
2
variance σ

Criteria for assessing estimators

• How do we know whether an estimator is good or bad? What criteria can we


use to assess the quality of an estimator?

• If we have random variables Y1 , Y2 , . . . , Yn with unknown mean µ, we could


use Ȳ as an estimator of µ but we could also use µ̂ = 2Y1 − 3Y2 + sin(Y4 − Y3 )
as an estimator of µ. We have no obvious reason to think that this µ̂ would
be a good estimator of µ, whereas intuition tells us that Ȳ is a good estimator
of µ. But how would we show that one estimator is bad and another is good?

• The criteria that we can use are related to the expected value and variance
of an estimator

• Remember that an estimator is a statistic, and a statistic is a random variable;


it therefore has an expected value and variance of its own

• Consider an unknown parameter θ and a point estimator θ̂ (putting a ˆ on a


parameter is a common way of denoting a point estimator of that parameter)
 
• The expected value of θ̂ would be written E θ̂ and the variance of θ̂ would
 
be written Var θ̂

• We can use these characteristics of the estimator θ̂ to describe how good it is

The Bias of an Estimator


 
• The bias of a point estimator θ̂ is given by B(θ̂) = E θ̂ −θ; it is the difference
between the expected value of the estimator and the actual parameter value

• If the
 expected
 value of the estimator equals the actual parameter value, i.e.
if E θ̂ = θ, then θ̂ is said to be an unbiased estimator of θ. Otherwise, θ̂
is said to be a biased estimator of θ.

• If we use an unbiased estimator many times, the results would be expected


to average out near the true parameter value. In other words, when using an
unbiased estimator, ‘on average, the sample statistic is equal to the parameter’.
Of course, in practice, we usually only use an estimator once for a particular
parameter, so this is no guarantee that our actual estimate is close to the
parameter. Nevertheless, unbiasedness is a property that a good estimator
should have.

20
• In practice, it can be proven mathematically that, with Ȳ and p̂ as defined
above, E Ȳ = µ, E (p̂) = p and E (S 2 ) = σ 2 , meaning that Ȳ is an unbiased
estimator of µ, p̂ is an unbiased estimator of p and S 2 is an unbiased estimator
of σ 2

• Unbiasedness is the reason why, in the usual S 2 formula, we divide by n − 1


rather than n

• Interestingly, S (the sample standard deviation) is not an unbiased estimator


of σ. But we will not go into this issue here.

The Variance and Consistency of an Estimator

• It is also clear that a good estimator will have a small variance


 
• If Var θ̂ is small, then if we use this estimator repeatedly, our results will
all be close together

• Of course, if an estimator had a variance of 0, it would be a constant; it would


give us the same result every time

• Provided that θ̂ is also unbiased, this means we will get a result close to the
population parameter almost every time

• If we are comparing two unbiased estimators θ̂1 and θ̂2 , the estimator with a
smaller variance is said to be more efficient

• The variance of an estimator always depends on the sample size n, and should
decrease as n increase

• One desirable property of an estimator is consistency.


  An estimator θ̂ is said
to be a consistent estimator of θ if lim Var θ̂ = 0, that is, if the variance
n→∞
of the estimator approaches 0 as the sample size approaches infinity

• Based on these and other properties, it can be proven mathematically that:

◦ Ȳ is the best possible point estimator of µ


◦ p̂ is the best possible point estimator of p
◦ S 2 is the best possible point estimator of σ 2

Interval Estimators and Confidence Intervals

• An interval estimator is a rule (usually expressed as a formula) specifying


the method for using the sample measurements to calculate two numbers that
form endpoints of an interval

• The resulting interval should ideally have two properties:

1. It should contain the parameter θ we are estimating, i.e. the parameter


should lie between the two endpoints

21
2. It will be relatively narrow (if the endpoints are extremely far apart, the
interval will not be very useful)

• Since the interval estimator depends on random variables, the endpoints are
random and we cannot guarantee that the parameter θ will lie between them.
What we can do is generate an interval that has a certain probability of con-
taining the parameter θ

• Such an interval estimator is commonly referred to as a confidence interval


because we have a certain level of confidence that it contains the parameter

Deriving Confidence Intervals

• We have actually already derived something similar to a confidence interval


without even knowing it. In Chapter 3, the Central Limit Theorem Example,
part 2, asked us to ‘Find an approximate interval, that includes, with prob-
ability 0.95, the average fracture strength of 100 randomly selected pieces of
this glass’. If you review that example, you will have an idea of the logic that
is used to construct a confidence interval

• The general formula for a (1 − α)100% confidence interval for an unknown


parameter θ is as follows:

Point estimator ± Quantile × Standard error


r  
θ̂ ± qα/2 Var θ̂

• The quantile and standard error are taken from the sampling distribution of
the point estimator

• The quantile and standard error together are known as the margin of error
(sometimes written e)

• Important Note: ‘Standard error’ here covers only random sampling error,
i.e. the difference between the sample statistic and population parameter that
occurs due to random sampling. There are other kinds of statistical error
which, if present, could make our estimation procedures invalid. These kinds
of errors will be discussed in detail in Statistics 2A

• Alternatively the central limit theorem can be used, allowing us to use standard
normal quantiles even when the point estimator is not normally distributed.
In such cases the confidence interval is only approximate

• In the statement ‘(1 − α)100% confidence interval’, α is called the type I error.
We will discuss type I error in more detail later in the course, but for now you
can think of α simply as the probability that the confidence interval does not
contain the parameter

22
• Hence, if we want a 95% confidence interval, 95% = 0.95(100%) = (1 −
0.05)(100%) so α = 0.05. Similarly, if we want a 99% confidence interval
then α = 0.01.
• The reason why the quantile is written as qα/2 in the formula above is that
we usually divide the error equally at both ends of the interval, as seen in the
diagram below
• The area (probability) to the left of the lower endpoint is α/2 and the area
(probability) to the right of the upper endpoint is α/2, so the overall area
(probability) outside the interval is α, which means the area (probability)
inside the interval is 1 − α

3.2 Confidence Intervals for a Single Parameter


Deriving the Confidence Interval for the Mean when Standard Deviation
is Known
• Consider a population of independent, normally distributed random variables
with unknown mean µ and known standard deviation σ. A random sample of
size n is drawn from the population, i.e. Y1 , Y2 , . . . , Yn , in order to estimate µ.
• We know that Ȳ is the best possible point estimator for µ, so in this case µ is
our θ and Ȳ is our θ̂

23
• But what interval estimator should we use?

• Let L be the lower endpoint of our confidence interval and let U be the upper
endpoint.

• Our task is to find a formula for L and U using the sampling distribution of
Ȳ such that the probability that µ lies between L and U equals 1 − α, i.e.
Pr (L < µ < U ) = 1 − α

• Remember the sampling distribution of Ȳ : based on the assumptions above,


σ2
Ȳ is normally distributed with a mean of µ and a variance of
n
• Let us derive the confidence interval beginning with a simple normal proba-
bility expression

24
Pr (−z < Z < z) = 1 − α
Let zα/2 be the value of z that satisfies this equation
Clearly this is valid since the normal distribution is symmetrical

Pr −zα/2 < Z < zα/2 = 1 − α
 
Ȳ − µ
Pr −zα/2 < √ < zα/2 = 1 − α
σ/ n
This follows from the sampling distribution
 
σ σ
Pr −zα/2 √ < Ȳ − µ < zα/2 √ =1−α
n n
 
σ σ
Pr −zα/2 √ − Ȳ < −µ < zα/2 √ − Ȳ = 1 − α
n n
 
σ σ
Pr zα/2 √ + Ȳ > µ > −zα/2 √ + Ȳ = 1 − α
n n
Here we have multiplied the inequality by -1, so we must change < to >
 
σ σ
Pr Ȳ − zα/2 √ < µ < Ȳ + zα/2 √ =1−α
n n
Here we just rearranged the inequality to express it again in < terms
σ σ
Hence L = Ȳ − zα/2 √ and U = Ȳ + zα/2 √
n n

• The formula for an exact (1 − α)100% confidence interval for µ when σ is


σ
known can thus be expressed as Ȳ ± zα/2 √
n
• Note that if the variables Y1 , Y2 , . . . , Yn are not normally distributed, we can
still use this formula provided n is fairly large, but it is now an approximate
confidence interval and not an exact one, because we are then using the Central
Limit Theorem rather than the exact sampling distribution of Ȳ

Confidence Interval for the Mean when Standard Deviation is Known:


Example 1

• In a random sample of 100 batteries produced by a certain method, the average


lifetime was 150 hours. It is assumed that the lifespan of a battery is normally
distributed with a standard deviation of 25 hours.

1. Find a 95% confidence interval for the mean lifetime of batteries produced

25
by this method

α = 1 − 0.95 = 0.05, Ȳ = 150, σ = 25, n = 100


σ
Ȳ ± zα/2 √
n
25
150 ± z0.025 √
100
z0.025 = 1.96 (see Central Limit Theorem Example)
25
150 ± (1.96)
10
L = 145.3, U = 154.9

We can be 95% confident that the mean lifetime of batteries produced by


this method is between 145.3 hours and 154.9 hours
2. Find a 99% confidence interval for the mean lifetime of batteries produced
by this method

α = 1 − 0.99 = 0.01, Ȳ = 150, σ = 25, n = 100


σ
Ȳ ± zα/2 √
n
25
150 ± z0.005 √
100
To find z0.005 :
Pr (−z0.005 < Z < z0.005 ) = 0.99
Pr (Z < z0.005 ) − Pr (Z < −z0.005 ) = 0.99
Pr (Z < z0.005 ) − [1 − Pr (Z < z0.005 )] = 0.99
Pr (Z < z0.005 ) − 1 + Pr (Z < z0.005 ) = 0.99
2 Pr (Z < z0.005 ) = 1.99
Pr (Z < z0.005 ) = 0.995
z0.005 ≈ 2.575 (about halfway between 2.57 and 2.58)
(We could also use linear interpolation)
25
150 ± (2.575)
10
L = 143.5625, U = 156.4375

We can be 99% confident that the mean lifetime of batteries produced


by this method is between 143.5625 hours and 156.4375 hours. Notice
that as the confidence level increases, the interval widens.
3. An engineer claims that the mean lifetime is between 147 and 153 hours.
With what level of confidence can this statement be made?

26
U = 153
σ
Ȳ + zα/2 √ = 153
n
25
150 + zα/2 √ = 153
100
2.5zα/2 = 3
zα/2 = 1.2
Pr (Z > 1.2) = 1 − Pr (Z < 1.2) = 1 − 0.8849 = 0.1151
Thus α/2 = 0.1151
α = 0.2302
1 − α = 1 − 0.2302 = 0.7698

We can state with about 77% confidence that the mean lifetime is between
147 and 153 hours.

Confidence Interval for the Mean when Standard Deviation is Known:


Example 2

• Computers in some vehicles calculate various quantities related to perfor-


mance. One of these is the fuel efficiency, measured in km per L. An experi-
ment with a 2006 Toyota Highlander Hybrid was conducted which randomly
recorded fuel efficiency readings shown on the vehicle computer while the car
was set to 100 km/h by cruise control. Here are the fuel efficiency values ob-
tained in km/L: Suppose that the standard deviation of the population of fuel

15.8 8.9 7.4 10.6 11.5 15.7 16.5 15.0 13.8 10.2
19.0 26.1 25.8 41.4 34.4 32.5 25.3 26.5 28.2 22.1

efficiency readings for this vehicle is known to be σ = 2.8 km/L.

1. What is σȲ , the standard deviation of Ȳ ?

σ 2.8
σȲ = √ = √ = 0.626
n 20

2. Determine a 95% confidence interval for µ, the mean fuel efficiency for
this vehicle when travelling 100 km/h

27
α = 1 − 0.95 = 0.05, σ = 2.8, n = 20
n
1X 1
Ȳ = Yi = (15.8 + 8.9 + 7.4 + · · · + 22.1)
n i=1 20
= 12.255
σ
Ȳ ± zα/2 √
n
2.8
12.255 ± z0.025 √
20
z0.025 ≈ 1.96
2.8
12.255 ± (1.96) √
20
L = 11.028, U = 13.482

We can say with 95% confidence that the mean fuel efficiency for this
vehicle when travelling 100 km/h is between 11.028 km/L and 13.482
km/L
3. Is your confidence interval exact or approximate? Why?
The confidence interval is approximate, because we were not given that
the fuel efficiency readings are normally distributed; thus the confidence
interval is based on the Central Limit Theorem

Confidence Interval for the Mean when Standard Deviation is Known:


Exercises

• A supplier sells synthetic fibers to a manufacturing company. A simple random


sample of 81 fibers is selected from a shipment. The average breaking strength
of these is 29 kg. It is known that the standard deviation of breaking strength
in all the fibers in the shipment is 9 kg. The breaking strength of the fibers is
assumed to be normally distributed.

1. Find a 90% confidence interval for the mean breaking strength of all the
fibers in the shipment
2. Find a 98% confidence interval for the mean breaking strength of all the
fibers in the shipment
3. Are these confidence intervals exact or approximate? Explain.
4. What is the confidence level of the interval (27.5, 30.5)?

• (Challenging) Suppose you were told that the 90% confidence interval for the
mean µ based on some known σ is (329.87, 356.46). However, you want a 95%
confidence interval. With only the information provided here determine the
95% confidence interval.

28
Confidence Interval for the Mean when Standard Deviation is Unknown

• The interval estimator we have used so far is applicable as long as the popu-
lation standard deviation σ is known. Both the exact sampling distribution of
Ȳ and the Central Limit Theorem rest on this assumption

• However, in practice, the population standard deviation is usually unknown

• Thus we need an interval estimator for the mean µ of a population whose


standard deviation is unknown
 σ2
• We can no longer know the exact value of Var Ȳ = because this depends
n
2
on σ which we do not know
n
!
1 X
• We have to replace σ 2 with its sample estimator S 2 = Y 2 − nȲ 2
n − 1 i=1 i

Ȳ − µ
• While √ follows a standard normal distribution under the usual assump-
σ/ n
Ȳ − µ
tions (see Sampling Distributions), √ does not follow a standard normal
S/ n
distribution

• Provided the sample size n is very large, we can still rely on the same interval
estimator formula, with σ replaced with S, and it will give us a reasonably
good approximation

• However, while some textbooks recommend this approximation, I do not: even


when n = 500, the 97.5% quantile for the t distribution (discussed below) is
1.9647 compared to 1.9600 for the standard normal distribution; this is still a
non-trivial source of measurement error

• Instead, when the population standard deviation is unknown we should use


the t distribution (discussed below)

Confidence Interval for the Mean when Standard Deviation is Unknown:


the t Distribution
Ȳ − µ
• The exact distribution of √ when the population is normally distributed
S/ n
is known as Student’s t distribution (or just the t distribution)

Theorem 3. Let Y1 , Y2 , . . . , Yn be a random sample of size n from a normally


distributed population with mean µ and variance σ 2 . Let Ȳ be the sample
Ȳ − µ
mean and S 2 be the sample variance. Then √ follows a t distribution
S/ n
with ν = n − 1 degrees of freedom.

• What is a t distribution and what are ‘degrees of freedom’ ?

29
• A t distribution is a continuous probability distribution whose shape resembles
that of the normal distribution:

• The probability density function of the t distribution is as follows:


 
ν+1
Γ − ν + 1
y2

2 2
f (y) = √ ν  1 + for − ∞ < y < ∞ where
πνΓ ν
Z ∞ 2
k−1 −x
Γ(k) = x e dx is the Gamma function
0

• As with the normal distribution, determining probabilities by integrating this


probability density function is impossible analytically; we must use numerical
methods

• Standard statistical software packages can do these calculations numerically

• When working by hand, one uses a t distribution table to find the necessary
quantiles

• The t distribution has one parameter, the Greek letter ν, which is referred to
as the ‘degrees of freedom’

• The following graph compares the standard normal distribution’s probability


density function to the t distribution’s probability density function for different
ν (degrees of freedom) values

• It shows that as ν becomes large, the t distribution approaches the standard


normal distribution

30
• The interval estimator for the mean of a normally distributed population with
unknown variance is as follows:
S
Ȳ ± tα/2,n−1 √
n
Confidence Interval for the Mean when Standard Deviation is Unknown:
Example

• Estimates of the earth’s biomass (the total amount of vegetation held by the
earth’s forests) are important in determining the amount of unabsorbed carbon
dioxide that we can expect to remain in the earth’s atmosphere. Suppose that
a sample of 61 one-square-metre plots randomly chosen in North America’s
northern forests produced a mean biomass of 4.2 kg per m2 and a standard
deviation of 1.5 kg per m2 . Give a 95% confidence interval for the average
biomass for North America’s northern forests.

S
Ȳ ± tα/2,n−1 √
n
1.5
4.2 ± t0.025,61−1 √
61
t0.025,60 = 2.000 (from table)
1.5
4.2 ± 2 √
61
L = 3.816, U = 4.584

31
We can say with 95% confidence that the average biomass for North America’s
northern forests is between 3.816 kg per m2 and 4.584 kg per m2 .

Confidence Interval for the Mean when Standard Deviation is Unknown:


Exercise

• A courier service wants to estimate the average delivery time for its local
deliveries. A random sample of times (in minutes) for 12 deliveries to an
address across town was recorded. These data are shown below. Assuming
the data are normally distributed, give a 98% confidence interval for the mean
delivery time.

3.03 6.33 6.50 5.22 3.56 6.76


7.98 4.82 7.96 4.54 5.09 6.46

Confidence Interval for the Mean: Summary

• However, there is an important caveat (warning): the t distribution relies


strongly on the assumption that the population is normally distributed. If the
population is not normally distributed, we should be very cautious about using
an interval estimator based on the t distribution, and we should definitely not
use such an estimator if the sample size is small

• There are other methods that can be used in such cases, such as bootstrapping
or nonparametric methods. We will not discuss these other methods in this
module

Population Standard Population Standard


Deviation Known Deviation Unknown
Population Normally Use z quantiles; interval is Use t quantiles; interval is
Distributed, Large exact exact
Sample Size
Population Normally Use z quantiles; interval is Use t quantiles; interval is
Distributed, Small exact exact
Sample Size
Population Not Nor- Use z quantiles; interval is Use z or t quantiles WITH
mally Distributed, approximate CAUTION ; interval is ap-
Large Sample Size proximate and possibly un-
reliable
Population Not Nor- Use bootstrapping or non- Use bootstrapping or non-
mally Distributed, parametric methods parametric methods
Small Sample Size

32
• Here are some simulation results that illustrate the validity of methods in dif-
ferent situations (the uniform and exponential distributions are two examples
of non-normal distributions). The desired confidence level in each case is 95%

Simulated Confidence Level for µ under different conditions


based on 1 million simulations
n Standard Deviation Distribution Simulated Confidence Quantile Used
5 known normal 0.95 z
5 unknown normal 0.9502 t
30 known normal 0.9499 z
30 unknown normal 0.9501 t
100 known normal 0.9499 z
100 unknown normal 0.9498 t
5 known uniform 0.9524 z
5 unknown uniform 0.9345 t
30 known uniform 0.9504 z
30 unknown uniform 0.9493 t
100 known uniform 0.9503 z
100 unknown uniform 0.9498 t
5 known exponential 0.9565 z
5 unknown exponential 0.8829 t
30 known exponential 0.9521 z
30 unknown exponential 0.9272 t
100 known exponential 0.9508 z
100 unknown exponential 0.9419 t

• These results demonstrate that the true confidence level for these methods is
exactly or close to 95% in most cases

• When the population is non-normal (especially exponential), the confidence


level drops slightly below 95%, especially for a small sample size

Confidence Interval for a Proportion: t Distribution Approach

• We have learned previously that by the CLT, if we have a random variable Y


Y
from a binomial distribution with parameters n and p, and p̂ = is our point
n
estimator of p, then the following variable approximately follows a standard
normal distribution:
p̂ − p
Z=p
p(1 − p)/n

33
• Recall that this approximation is satisfactory as long as np ≥ 5 and n(1−p) ≥ 5

• There is a problem, however: p will always be unknown (otherwise why would


we be trying to estimate it?)

• This means we cannot calculate the variance of p̂ which is p(1 − p)/n


p(1 − p)
• Instead, we must estimate the standard deviation of p̂: if Var (p̂) =
n
d (p̂) = p̂(1 − p̂)
then Var
n
• We have replaced p in the variance formula with its point estimator p̂

• This is similar to replacing σ with S in the interval estimator for the mean,
and it thus requires us to use the t distribution instead of the Z distribution:
p̂ − p
T =p
p̂(1 − p̂)/n

• T approximately follows a t distribution with n − 1 degrees of freedom

• We thus derive a confidence interval for a proportion as follows:

Pr (−t < T < t) = 1 − α


Let tα/2,n−1 be the value of t that satisfies this equation
Clearly this is valid since the t distribution is symmetrical

Pr −tα/2,n−1 < T < tα/2,n−1 = 1 − α
!
p̂ − p
Pr −tα/2,n−1 < p < tα/2,n−1 ≈ 1 − α
p̂(1 − p̂)/n
This follows from the sampling distribution
r r !
p̂(1 − p̂) p̂(1 − p̂)
Pr −tα/2,n−1 < p̂ − p < tα/2,n−1 =1−α
n n
r r !
p̂(1 − p̂) p̂(1 − p̂)
Pr −tα/2,n−1 − p̂ < −p < tα/2,n−1 − p̂ = 1 − α
n n
r r !
p̂(1 − p̂) p̂(1 − p̂)
Pr tα/2,n−1 + p̂ > p > −tα/2,n−1 + p̂ = 1 − α
n n
Here we have multiplied the inequality by -1, so we must change < to >
r r !
p̂(1 − p̂) p̂(1 − p̂)
Pr p̂ − tα/2,n−1 < p < p̂ + tα/2,n−1 =1−α
n n
Here we just rearranged the inequality to express it again in < terms
r r
p̂(1 − p̂) p̂(1 − p̂)
Hence L = p̂ − tα/2,n−1 and U = p̂ + tα/2,n−1
n n

34
• The formula for an approximate
r(1−α)100% confidence interval for p can thus
p̂(1 − p̂)
be expressed as p̂ ± tα/2,n−1
n
• Note: some textbooks will advise using this approximation with a normal z
quantile instead of the t quantile; this is not very accurate however!

• Even the t approximation will only be effective for relatively large n. We can
use the same rule of thumb as for the normal approximation: the t confidence
interval can be used if np ≥ 5 and n(1 − p) ≥ 5

Confidence Interval for a Proportion: Wilson Score Interval

• Here is another formula for an approximate (1 − α)100% confidence interval


for p, which is more complicated but a better approximation:
 s 
2 2
1 zα/2 p̂(1 − p̂) zα/2
• 2 /n p̂ + 2n ± zα/2 +
 
1 + zα/2 n 4n2

• We will not cover the derivation here.

• The following table gives the simulated probability that p falls inside a 95%
confidence interval calculated using the t distribution method and the Wilson
method for different values of p and n:

• We can see that, in general, the Wilson method is better. And notice how bad
the t method is when p and n are both small! For instance, if p = 0.01 and
n = 50, our so-called ‘95% confidence interval’ actually contains p with only
39% probability, not 95%!

Confidence Interval for a Proportion: Example 1

• CPUT is looking at introducing a new transport service for students and is


interested in estimating the proportion of students who support this initiative.
A random sample of 81 students is asked their opinion of the initiative and
65 of these students indicate that they support it. Obtain a 98% confidence
interval for the proportion of all CPUT students who support the initiative.
(You may assume the population size is infinite.)

35
Simulated Confidence Level for p using Wilson and t methods
based on 10 million simulations
p n Wilson method t method
0.01 10 0.9044 0.0956
0.01 20 0.9832 0.1821
0.01 30 0.9639 0.2603
0.01 40 0.9392 0.3309
0.01 50 0.9107 0.3949
0.1 10 0.9297 0.6497
0.1 20 0.9568 0.8763
0.1 30 0.9742 0.9498
0.1 40 0.9433 0.9145
0.1 50 0.9703 0.879
0.3 10 0.9244 0.9611
0.3 20 0.9753 0.9475
0.3 30 0.9299 0.953
0.3 40 0.9443 0.9301
0.3 50 0.9567 0.9476
0.5 10 0.9785 0.8908
0.5 20 0.9586 0.9586
0.5 30 0.9572 0.9572
0.5 40 0.9615 0.9615
0.5 50 0.9351 0.9351

• Using the basic t distribution method:


65
p̂ =
81 r
p̂(1 − p̂)
p̂ ± tα/2,n−1
s n
65 65
65 81
(1 − 81 )
± t0.01,80
81 81
s
65
65 (1 − 65 )
± 2.374 81 81
81 81
= (0.6974, 0.9075)

• Hence we conclude with 98% confidence that p is between 0.6974 and 0.9075,
i.e. between 70% and 91% of CPUT students support the new transport
service.

36
• Using the Wilson score interval approach:
 s 
2 2
z z
1 p̂ + α/2 ± zα/2 p̂(1 − p̂) + α/2 
2
1 + zα/2 /n 2n n 4n2

z0.01 = 2.33 from Z table; value to 3 decimal places is 2.326


 s 
2 65 65 2
1 (1 − )
 65 + 2.33 ± 2.33 81 81
+
2.33 
2
1 + 2.33 /81 81 2(81) 81 4(81)2

= (0.6819, 0.8850)

• Hence we conclude with 98% confidence that p is between 0.6819 and 0.8850,
i.e. between 68% and 89% of CPUT students support the new transport
service.
Confidence Interval for a Proportion: Example 2
• A toxicologist wants to estimate the proportion of rats that develop a certain
disease in a laboratory after exposure to a certain drug. A random sample
of 150 rats is exposed to the drug, and 42 of these later test positive for the
disease. Using the Wilson score interval method, give a 95% confidence interval
for the proportion of rats exposed to the drug that develop the disease.
 s 
2 2
z z
1 p̂ + α/2 ± zα/2 p̂(1 − p̂) + α/2 
2
1 + zα/2 /n 2n n 4n2

z0.025 = 1.96 from Z table


42
p̂ =
 s  150
2 42 42
1 42 1.96 150
(1 − 150 ) 1.962 
 + ± 1.96 +
1 + 1.962 /150 150 2(150) 150 4(150)2

= (0.2143, 0.3567)

• Hence we conclude with 95% confidence that p is between 0.2143 and 0.3567,
i.e. between 21% and 36% of rats exposed to the drug develop the disease
Confidence Interval for a Proportion: Exercises
1. A pizza chain tests out a new marketing strategy by sending out a promotional
offer by SMS to a random sample of 25 past customers in their database. Six
customers take advantage of the promotional offer. Give a 90% confidence
interval for the proportion of all customers in the database who would take
advantage of this promotional offer. Use the t distribution method.
2. An ecologist estimates by studying a random sample of 15 ecosystems in the
Western Cape that 60% of ecosystems are under environmental threat. Us-
ing the Wilson score interval method, give a 95% confidence interval for the
proportion of ecosystems in the Western Cape that are under environmental
threat.

37
3.3 Confidence Intervals for Comparing Two Parameters
Confidence Interval for Difference in Means of Two Normal Populations
with Known Variances

• Suppose we have random samples from two normally distributed populations:

◦ X1 , X2 , . . . , Xn1 is a random sample from a population with unknown


mean µ1 and known variance σ12
◦ Y1 , Y2 , . . . , Yn2 is a random sample from a population with unknown mean
µ2 and known variance σ22

• We may be interested in estimating the difference between the two population


means, µ1 − µ2

• A point estimator for the difference would be simply X̄ − Ȳ , but what about
an interval estimator?

• We already derived the sampling distribution for X̄ − Ȳ , so we can use this to


derive our confidence interval

• Recall:

µX̄−Ȳ = E X̄ − Ȳ = µ1 − µ2
2
 σ12 σ22
σX̄−Ȳ = Var X̄ − Ȳ = +
n1 n2

• And X̄ − Ȳ is a normally distributed random variable, which means that we


can perform a transformation to get a standard normal random variable:

(X̄ − Ȳ ) − (µ1 − µ2 )
Z= s
σ12 σ22
+
n1 n2

• The derivation of the confidence interval is almost the same as for a single

38
mean:
Pr (−z < Z < z) = 1 − α
Let zα/2 be the value of z that satisfies this equation
Clearly this is valid since the normal distribution is symmetrical
 
Pr −zα/2 < Z < zα/2 = 1 − α
 

X̄ − Ȳ − (µ1 − µ2 )
 
 
−zα/2 <
Pr  s
2 2 =1−α
< zα/2 
 σ1 σ2 
+
n1 n2
This follows from the sampling distribution
 v v 
u 2 2
u 2 2
uσ σ uσ σ2
1 2 1
Pr −zα/2 + < X̄ − Ȳ − (µ1 − µ2 ) < zα/2 + =1−α
 t t 
n1 n2 n1 n2
 v v 
u 2 2
u 2 2
uσ σ2 uσ σ2
1  1 
Pr −zα/2 + − X̄ − Ȳ < − (µ1 − µ2 ) < zα/2 + − X̄ − Ȳ  = 1 − α
 t t
n1 n2 n1 n2
 v v 
u 2
σ2
u 2
uσ uσ σ2
Pr zα/2 t 1 + 2 + X̄ − Ȳ > µ > −zα/2 t 1 + 2 + X̄ − Ȳ  = 1 − α
  
n1 n2 n1 n2

Here we have multiplied the inequality by -1, so we must change < to >
 v v 
u 2
σ2
u 2
uσ uσ σ2 
Pr X̄ − Ȳ − zα/2 t 1 + 2 < µ < X̄ − Ȳ + zα/2 t 1 + 2  = 1 − α

n1 n2 n1 n2

Here we just rearranged the inequality to express it again in < terms


v v
u 2
σ2
u 2
uσ uσ σ2
Hence L = X̄ − Ȳ − zα/2 t 1 + 2 and U = X̄ − Ȳ + zα/2 t 1 + 2
n1 n2 n1 n2

• The formula for an exact (1 − α)100% confidence interval fors


µ1 − µ2 when σ1
σ12 σ22
and σ2 are known can thus be expressed as X̄ − Ȳ ± zα/2 +
n1 n2
• Note: if both populations have the same standard deviation σ, we will of
course just replace σ1 and σ2 in the formula with σ
Confidence Interval for Difference in Means of Two Normal Populations
with Unknown but Equal Variances
• What if the variances σ12 and σ22 are unknown?
• As with the confidence interval for one mean, we will modify our confidence
interval to include an estimator for the variance, and this will require us to
use the t distribution rather than the z distribution
• The procedure for estimating the variances is much simpler if we can assume
that σ12 = σ22 = σ 2 .
(n1 − 1)S12 + (n2 − 1)S22
• In that case, our point estimator for σ 2 is the ‘pooled variance’ Sp2 = ;
n1 + n2 − 2
this can be recognized as a weighted average of S12 and S22 , the sample variances
of the samples from the two populations
X̄ − Ȳ − (µ1 − µ2 )
• It can be proven that T = s   follows a t distribution with
1 1
Sp2 +
n1 n2
n1 + n2 − 2 degrees of freedom

39
• Thus our exact (1 − α)100% confidence interval for µ1 − µ2 is as follows (the
full derivation is not shown
s but the method is the same as in other cases):
 
1 1
X̄ − Ȳ ± tα/2,n1 +n2 −2 Sp2 +
n1 n2

Confidence Interval for Difference in Means of Two Normal Populations


with Unknown, Unequal Variances (Optional)

• If σ12 6= σ22 then we simply use S12 to estimate σ12 and S22 to estimate σ22

• However, determining the degrees of freedom for our t distribution in this case
is much more complicated and we can only approximate it

• We end up with the following approximate (1 − α)100% confidence interval for


µ1 − µ2 :
s
S12 S22
X̄ − Ȳ ± tα/2,ν + where
n1 n2
 2 2
S1 S22
+
n1 n2
ν≈ 4
S1 S24
+
n21 (n1 − 1) n22 (n2 − 1)

• Note: it is usually necessary to round ν to the nearest integer

Confidence Interval for Difference in Means: Example 1

• Seasonal ranges (in hectares) for crocodiles were monitored on a lake in Malawi
by biologists. Five crocodiles monitored in the spring showed ranges of 8.0,
12.1, 8.1, 18.2, and 31.7. Four different crocodiles monitored in the summer
showed ranges of 102.0, 81.7, 54.7 and 50.7. Give a 95% confidence interval
for the difference between mean spring and summer ranges. You may assume
that crocodile range is normally distributed and that the variances of crocodile
range in spring and summer are equal.
Let X1 , X2 , X3 , X4 , X5 be the spring crocodile ranges and let Y1 , Y2 , Y3 , Y4 be

40
the summer crocodile ranges.
s  
1 1
X̄ − Ȳ ± tα/2,n1 +n2 −2 Sp2 +
n1 n2
1
X̄ = (8.0 + 12.1 + 8.1 + 18.2 + 31.7) = 15.62
5
1
Ȳ = (102.0 + 81.7 + 54.7 + 50.7) = 72.275
4
2 2
(n 1 − 1)S1 + (n2 − 1)S2
Sp2 =
n1 + n2 − 2
n1
1 X 2
S12 = Xi − X̄ = 98.057
n1 − 1 i=1
n 2
1 X 2
S22 = Yi − Ȳ = 582.2558
n2 − 1 i=1
(5 − 1)(98.057) + (4 − 1)(582.2558)
Sp2 = = 305.5708
s 5 + 4 − 2
 
1 1
15.62 − 72.275 ± t0.025,7 305.5708 +
5 4
t0.025,7 = 2.365
L = −56.655 − 2.365(11.72633) = −84.388
U = −56.655 + 2.365(11.72633) = −28.922

Thus we can conclude with 95% confidence that the difference in mean spring
and summer ranges is between -28.922 hectares and -84.388 hectares (or, be-
tween 28.922 hectares and 84.388 hectares, if we just want to express the
answer in absolute terms)

Confidence Interval for Difference in Means: Example 2

• Every month a clothing store conducts an inventory and calculates losses from
theft. The store would like to reduce these losses and is considering two
methods. The first is to hire a security guard, and the second is to install
cameras. To help decide which method to choose, the manager hired a security
guard for six months. During the next six-month period, the store installed
cameras but had no security guard. The monthly losses were recorded and
are listed below. Provide a 95% confidence for the difference in mean monthly
theft losses under the two methods, and use it to infer whether one method is
better than the other. You may assume that the standard deviation of monthly
theft losses is R800 regardless of which theft prevention method is used. Let
X1 , X2 , . . . , X6 be the theft losses in the months when a security guard was on
duty, and let Y1 , Y2 , . . . , Y6 be the theft losses in the months when there was a

41
Monthly losses due to theft
Security guard R3550 R2840 R4010 R3980 R4770 R2540
Cameras R4860 R3030 R2700 R3860 R4110 R4350

camera.
s
σ12 σ22
X̄ − Ȳ ± zα/2 +
n1 n2
1
X̄ =(3550 + 2840 + · · · + 2540) = 3615
6
1
Ȳ = (4860 + 3030 + · · · + 4350) = 3818.3333
6
z0.025 = 1.96
r
8002 8002
3615 − 3818.3333 ± 1.96 +
6 6
= (−1108.619, 701.9519)

We are 95% confident that the difference in mean monthly theft losses using
the security guard method as compared to the camera method are between -
R1109 and R702. Because this interval includes 0, we cannot be 95% confident
that one method is better than the other: there may be no mean difference.

Confidence Interval for Difference in Proportions

• In the previous chapter we learned about the sampling distribution of a dif-


ference between two sample means, and we learned about the sampling dis-
tribution of a sample proportion, but we never learned about the sampling
distribution of a difference between two proportions

• We can introduce this result now. Suppose we have two populations each
having an infinite number of objects which can be classified as either ‘successes’
or ‘failures’. Let p1 be the proportion of successes in the first population and
let p2 be the proportion of successes in the second population. If we take
a random sample of n1 independent objects from the first population and a
random sample of n2 independent objects from the second population, and
we call the number of successes in the first sample X and the number of
successes in the second sample Y , then our point estimators for p1 and p2 will
X Y
respectively be p̂1 = and p̂2 = .
n1 n2
• Our point estimator for the difference p1 − p2 will then be p̂1 − p̂2

• The sampling distribution for p̂1 − p̂2 is as follows:

◦ µp̂1 −p̂2 = E (p̂1 − p̂2 ) = p1 − p2


p1 (1 − p1 ) p2 (1 − p2 )
◦ σp̂21 −p̂2 = Var (p̂1 − p̂2 ) = +
n1 n2

42
◦ We could use the central limit theorem to obtain an approximately stan-
dard normal statistic; however, we face the same problem we had in the
confidence interval for a single proportion p: we don’t know the value of
p1 and p2 and thus can’t calculate the variance of the estimator. Thus,
we replace p1 and p2 in the formula with their estimators:
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
◦ σ̂p̂21 −p̂2 = Var (p̂1 − p̂2 ) = +
n1 n2
◦ This results in a t distribution statistic instead of a z distribution statistic
p̂1 − p̂2 − (p1 − p2 )
T =s has approximately a t distribution with
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
+
n1 n2
n1 + n2 − 2 degrees of freedom, provided n1 and n2 are fairly large
◦ From this we can derive the following (1 − α)100% confidence interval for
p1 s − p2 :
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
p̂1 − p̂2 ± tα/2,n1 +n2 −2 +
n1 n2
Confidence Interval for Difference in Proportions: Example

• A firm has classified its customers in two ways: (1) according to whether the
account is overdue and (2) whether the account is new (less than 12 months)
or old. To acquire information about which customers are paying on time and
which are overdue, a random sample of 292 customer accounts was drawn.
Each was categorized as either a new account or an old one, and whether the
customer has paid or is overdue. The results are summarized below. Let p1 be
the proportion of new accounts that are overdue and let p2 be the proportion
of old accounts that are overdue. Provide a 90% confidence interval for p1 −p2 .

New Account Old Account


Sample size 83 209
Overdue account 12 49

s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
p̂1 − p̂2 ± tα/2,n1 +n2 −2 +
n1 n2
p̂1 = 12/83
p̂2 = 49/209
t0.05,83+209−2 = t0.05,290 ≈ t0.05,100 = 1.660
s
12 12 49 49
 
83
1 − 83 209
1− 209
12/83 − 49/209 ± 1.660 +
83 209
= (−0.1703, −0.0094)

43
Thus we can be 90% confident that the proportion of new accounts that are
overdue is less than the proportion of old accounts that are overdue by between
0.0094 and 0.1703.

Confidence Intervals for Comparing Two Parameters: Exercises

1. A sample of 8 room air conditioners of a certain model had a mean sound


pressure of 52 decibels (dB), and a sample of 12 air conditioners of a different
model had a mean sound pressure of 46 dB. Suppose it is known that the
standard deviation of sound pressure across all air conditioners of model 1 is
5, while the standard deviation of sound pressure across all air conditioners of
model 2 is 2. Find a 96% confidence interval for the difference in mean sound
pressure between the two models.

2. Several specimens of coal were sampled from each of two mines, and the heat
capacity (in kilocalories per kg) was measured for each specimen. The results
are below. Obtain a 99% confidence interval for the difference in mean heat
capacity between coal from mine 1 and coal from mine 2. It is assumed that
the variance of heat capacity of coal is the same in the two mines.

Heat Capacity of Coal Specimens in kCal per kg


Mine 1 4167 4268 4159 4285 4229 4386 4103
Mine 2 3924 3988 4096 4026 4235 4178

3. Six months ago, a survey was undertaken to determine the degree of support
for a national political leader. Of a sample of 1100 people, 56% indicated
that they support this politician. This month, another survey of 800 people
estimated that 46% now support the leader. Estimate with 95% confidence
the decrease in percentage support for this politician over the past six months.

44
Interval Estimators for Population Variance

• In this module, we have not discussed the sampling distribution of the sample
variance S 2 , nor have we discussed confidence intervals for the population
variance σ 2

• It is, however, possible to do both; you may wish to see a Mathematical


Statistics textbook for the details

• We will touch on this topic briefly when we do hypothesis testing involving


variances in the next chapter

4 Introduction to Hypothesis Testing


4.1 Hypothesis Testing: Basic Concepts and Definitions
Statistical Inference

• Two major tasks in statistics are estimation and inference. In the previous
chapter we focused on estimation, in which we were making quantitative state-
ments about parameters (either using a point estimate or an interval estimate)

• Now we will turn our attention to inference. An inference is a conclusion


reached on the basis of evidence.

• In the context of statistics, an inference is a ‘Yes or No’, ‘True or False’ decision


concerning a claim made about the population.

Hypothesis Testing

• The method we will be using for statistical inference is called hypothesis testing

• In statistics, an hypothesis is a claim that is made about a population.

• Usually it is a claim about the value of a population parameter, e.g. µ = 5 or


p < 0.5

• There are also nonparametric hypotheses, but we will see these mostly in
second year

• An hypothesis test is a formal procedure for testing a hypothesis against the


observed data.

• To better understand how a hypothesis test works, consider the analogy of a


criminal trial

◦ The objective of a criminal trial is to reach a correct conclusion concerning


the innocence or guilt of the accused (defendant), on the basis of evidence

45
◦ There are two claims (or hypotheses) being made.
◦ The prosecution claims that the defendant is guilty.
◦ The defense claims that the defendant is not guilty.
◦ Both sides present and discuss evidence that they believe supports their
claim.
◦ The judge or jury then weighs the evidence and makes a decision.
◦ In most modern justice systems, there is a presumption of ‘Innocent until
proven guilty’. This means it is presumed that the defendant is innocent
unless the prosecution provides sufficient evidence to prove otherwise.

• Before we can understand how hypothesis testing works there is some termi-
nology we need to know, which we can relate back to the example of a criminal
trial.

The Elements of a Hypothesis Test

• A hypothesis test contains the following elements:

1. Null Hypothesis, H0 . This is the default claim that we will presume to be true
unless the evidence (data) proves otherwise (so ‘Innocent’ is the null hypothesis
in a criminal trial)

2. Alternative Hypothesis, HA (sometimes written H1 ). This is the opposite claim


to the null hypothesis, which we will adopt as our conclusion only if there is
enough evidence to disprove the null hypothesis (so ‘Guilty’ is the alternative
hypothesis in a criminal trial)

3. Test Statistic. Based on our data we will need to come up with a test statistic
which gives us a means of testing whether the data is consistent with the null
hypothesis being true.

4. Rejection Rule (sometimes called Rejection Region). The rejection rule tells
us the set of values of the test statistic which would cause us to reject the null
hypothesis.

5. Decision. By calculating the observed value of the test statistic and applying
the rejection rule we will reach a decision. The decision will either be ‘Reject
the null hypothesis’ or ‘Fail to reject the null hypothesis’. If we reject the null
hypothesis, we will conclude that the alternative hypothesis is true. If we fail
to reject the null hypothesis, this does NOT mean we have proven the null
hypothesis is true. Rather, it means there is not enough evidence to disprove
it. Suppose in a criminal trial that there is not enough evidence to prove the
defendant’s guilt. This does not actually prove that the defendant is innocent;
it only means the defendant cannot be found guilty based on the evidence
presented. For instance, if enough evidence was presented to show that there
is a 55% chance that the defendant is guilty, he or she would be found ‘Not
Guilty’.

46
Type I and Type II Error in a Hypothesis Test

• There are two types of error that can occur in a hypothesis test, as shown in
the following table

Decision of Test
Reject H0 Fail to reject H0
H0 False Correct Type II Error
Reality
H0 True Type I Error Correct

• A Type I Error occurs when we reject a null hypothesis that is really true

• The probability of committing a Type I Error can be expressed as a conditional


probability:
α = Pr (Reject H0 |H0 is True)

◦ In our criminal trial analogy, this would be equivalent to finding a person


guilty who is actually innocent

• A Type II Error occurs when we fail to reject a null hypothesis that is really
false

• The probability of committing a Type II Error can be expressed as a condi-


tional probability:

β = Pr (Fail to Reject H0 |H0 is False)

◦ In our criminal trial analogy, this would be equivalent to finding a person


innocent who is actually guilty

• These two error probabilities α and β are inversely related : if we reduce one,
the other increases

• Consider what would happen if a judge simply found every defendant innocent
regardless of the evidence

◦ This judge would never commit a Type I Error: α = 0


◦ But this judge would commit many Type II Errors: all guilty defendants
would go free

• Alternatively, consider what would happen if a judge simply found every de-
fendant guilty regardless of the evidence

◦ This judge would never commit a Type II Error: β = 0


◦ But this judge would commit many Type I Errors: all innocent defen-
dants would be convicted

• Obviously both of these situations are unacceptable. So we must find a balance


between α and β

47
• Which of these two errors is more serious?

• It depends on the context, but often a Type II Error is more serious. Consider
the criminal trial, for example. Is it a more serious mistake to let a guilty
person go free, or to send an innocent person to prison? The justice system in
most countries today is built on the principle that it is a more serious mistake
to send an innocent person to prison (Type I Error)

• Hence, a commonly used approach in hypothesis testing is to fix α and then


use calculus to minimize β subject to this value of α

• To fix α means we choose the largest value of α that we are willing to tolerate

• This value of α that we choose is called the significance level of the test

• (Technically the significance level is the opposite of the confidence level of a


confidence interval, 1 − α)

• We always mention the significance level when reporting a conclusion from a


hypothesis test

Hypothesis Testing Procedure

• Here is a simple example of a hypothesis testing situation: we want to know


whether a coin is fair

• If a coin is fair, then the probability of heads and the probability of tails will
be equal: p = 0.5

• If the coin is not fair, then the probability of heads and the probability of tails
will not be equal: p 6= 0.5

• Hence, if we want to potentially prove that a coin is unfair, we can test the
null hypothesis p = 0.5 against the alternative hypothesis p 6= 0.5

• The hypothesis testing procedure runs like this

1. State the null and alternative hypotheses. This is best done using math-
ematical notation rather than just words. Hence, instead of saying ‘The
coin is fair’ as our null hypothesis, we say p = 0.5
2. State the significance level, α. The most commonly used values of α are
0.05 and 0.01.
3. State the test statistic that will be used in the hypothesis test. A test
statistic must have the following characteristics:
◦ It must be a statistic we can calculate from the data available
◦ We must know the null distribution of the test statistic. The null
distribution is the sampling distribution of the test statistic under
the null hypothesis, i.e. assuming that the null hypothesis is true

48
4. Use the null distribution to determine the rejection rule. The rejection
rule is determined using the null distribution together with α. There are
two possible approaches to determining the rejection rule: the critical
value approach and the p-value approach. They will both result in
the same answer. This will be explained further below.

4.2 The Logic of a Hypothesis Test


Illustrating the Need for a Statistical Hypothesis Test

• Suppose we have a normally distributed population with unknown mean µ and


known standard deviation σ. We will draw an independent random sample
of size n from this population, Y1 , Y2 , . . . , Yn and use this data to make an
inference about µ

• The null hypothesis will always be H0 : µ = µ0 where µ0 is the null value,


i.e. the claimed value of µ that we are interested in testing (and potentially
disproving)

• The alternative hypothesis could be one of the following:

◦ HA : µ < µ0 (this is called a lower-tailed test)


◦ HA : µ > µ0 (this is called an upper-tailed test)
◦ HA : µ 6= µ0 (this is called a two-tailed test)

• The rejection rule will change depending which alternative hypothesis is used.
For now let us use the alternative hypothesis HA : µ 6= µ0 ; we will come back
to the other cases

• We know that our best point estimator for µ is Ȳ (it is unbiased and consistent)

• Thus it makes sense to reject H0 if Ȳ is much greater than µ0 or much less


than µ0 ; if Ȳ is close to µ0 then H0 will seem more reasonable and we should
not reject it

• The all-important question is, how much less or how much greater than µ0
should Ȳ be before we reject the claim that µ = µ0 ?

• To answer this question we need the sampling distribution of the estimator!

• To see this, consider the following diagram and ask yourself, ‘In which situation
would I reject the null hypothesis H0 : µ = µ0 ?’

• It seems obvious that we should reject the null hypothesis in the situation at
the top, because the estimate Ȳ is very far from the null value µ0 ; by contrast,
we should not reject the null hypothesis in the situation at the bottom, because
the estimate Ȳ is close to the null value µ0

49
• The two situations in between are perhaps more ambiguous; how does one
decide how far away from µ0 the estimate Ȳ must be before it is far enough to
reject H0 ? This already provides us with a reason for a statistical hypothesis
test

• But there is another problem: we have not taken into account the sampling
distribution of Ȳ

• Suppose that for these four situations the sampling distributions of Ȳ are as
illustrated below. How would this affect our view of H0 ?

• It is now clear that in the bottom picture we should reject H0 . Why?



◦ If µ = µ0 (null hypothesis) then E Ȳ = µ = µ0 . Thus we plot the sam-
pling distribution of Ȳ on the
√ graph centered at µ0 , normally distributed
with standard deviation σ/ n
◦ In the bottom graph we can see that the standard deviation is very small,
so the distribution is narrow. It is extremely unlikely that we would have
gotten a sample mean value as large as what we did get, if this is really
the sampling distribution of Ȳ
◦ Rather than conclude that something extremely unlikely happened, it
seems the sampling distribution of Ȳ is incorrect

50
◦ But since we have assumed the population is normally distributed with
standard deviation σ, the only way the sampling distribution could be
incorrect is if E Ȳ 6= µ0 , which means µ 6= µ0 . The null hypothesis
must be rejected!
• By similar reasoning, we cannot reject H0 in the case of the top graph, even
though Ȳ is much further from µ0 . Why not?
◦ The standard deviation of the sampling distribution is very large in this
case, so the distribution is wide. It is quite possible that we could have
gotten a sample mean value as large as what we did get, if this is really
the sampling distribution of Ȳ
◦ Thus it is reasonable to think µ0 could actually be the mean of the sam-
pling distribution
◦ Hence we cannot reject H0 ; it may be true
• What conclusions would you draw regarding the two graphs in the middle?
• We should now be able to understand why a common form of the test statistic
is
θ̂ − θ0
σθ̂

51
• Here, θ̂ is the point estimator, θ0 is the null value of the parameter (the value
if the null hypothesis is true), and σθ̂ is the standard deviation of the point
estimator
• The numerator of this fraction tells us how far apart the estimator and the
null value are in absolute terms (as in the first picture above)
• The denominator of this fraction expresses this distance in units of standard
deviations of the estimator, giving us an idea of how far apart the estimator
and the null value are in probability terms (as in the second picture above with
the distribution curves drawn)
Hypothesis Test for the Population Mean when Standard Deviation is
Known
• Let us use this logic to construct a hypothesis test in the situation we have
just described
• Our assumptions are that the population is normally distributed, Y1 , Y2 , . . . , Yn
is a sample of i.i.d. random variables from the population, and the population
standard deviation σ is known
• Our hypotheses are as follows:
H 0 : µ = µ0
HA : µ =6 µ0

• Let α be our significance level (the probability of a type I error that we are
willing to allow)
Ȳ − µ0
• Consider the test statistic Z = √
σ/ n
• What can we say about this test statistic?
• Under H0 (that is, if H0 is true),
√ Ȳ follows a normal distribution with mean
µ0 and standard deviation σ/ n; thus, under H0 , Z follows a standard normal
distribution:
• Suppose we calculate Ȳ and then Z from our data and we get Z = 3
• We can work out the probability of getting such an extreme Z value if H0 were
true:
Pr (Z > 3) + Pr (Z < −3) = [1 − Pr (Z < 3)] + [1 − Pr (Z < 3)]
= 2 [1 − Pr (Z < 3)]
= 2 [1 − 0.9987] = 0.0026

• We can either conclude that H0 is true, and something extremely improbable


has happened, or we can conclude that H0 is false
• We use our significance level to decide which of these two conclusions to draw.
How do we do this?

52
Making a Decision about a Null Hypothesis using the Significance Level

• As mentioned previously, there are two approaches to making a decision about


the null hypothesis using the significance level

• These are called the critical value approach and the p-value approach

• They give the same answer, but it is important to learn both

• The reason is that the critical approach can usually be done by hand (using
a statistical table), whereas the p-value approach requires a computer in most
cases

• However, most statistical software packages (e.g. SAS) use only the p-value
approach, so if you want to understand the output of these software packages
you must understand this approach

53
The Critical Value Approach
• The critical value is defined as the value of the test statistic for which the
probability of getting a more extreme value would be α
• In the case of our two-tailed test of H0 : µ = µ0 vs. HA : µ 6= µ0 , we are asking
for what value of z the following expression is true:
Pr (Z > |z|) = α
In other words: Pr (Z < −z) + Pr (Z > z) = α
[1 − Pr (Z < z)] + [1 − Pr (Z < z)] = α
2 [1 − P r (Z < z)] = α
α
1 − Pr (Z < z) =
2

• The reason why we have |z| instead of just z is that we would want to reject
H0 if Z is large and positive (Ȳ is much greater than µ0 ) or if Z is large and
negative (Ȳ is much less than µ0 ).
• Because the normal distribution is symmetrical, we divide α into two equal
pieces of α/2 (as in the graph above), just like we did with confidence intervals
• We call the z value that satisfies this probability statement zα/2 which is the
critical value for this hypothesis test
• If |Z| > zα/2 then we will reject H0 , otherwise we will fail to reject H0
• That is to say, we use the critical value to draw a rejection region of size
α, so that any Z statistic value more extreme than the critical value zα/2 will
lead us to reject the null hypothesis
The p-Value Approach
• The p-value approach describes the rejection rule in terms of probabilities
rather than in terms of values of the test statistic (e.g. z values)
• The p-value of a hypothesis test can be defined as ‘The probability of observing
a value of the test statistic as extreme or more extreme than what was actually
observed, given that the null hypothesis is true’
• In the case of the test for the mean that we are working with, the p-value
would be expressed as follows:
Pr (Z > |Zobserved | | H0 is true)

• The rejection rule under the p-value approach is simply this:


• If the p-value < α we reject H0 . If the p-value > α we fail to reject
H0 .

54
The Seven Steps of a Hypothesis Test

• The following seven-step procedure can be used to conduct a hypothesis test

1. State the null and alternative hypotheses in equation form

2. State the significance level α

3. State the test statistic and its null distribution. For example, ‘Under H0 ,
Ȳ − µ0
Z= √ has a standard normal distribution’
σ/ n
4. State the rejection rule. For example, ‘Reject H0 if...’

5. Calculated the observed value of the test statistic

6. Apply the rejection rule to make a decision either to reject H0 or to fail


to reject H0 . When you state your decision, always include the significance
level, e.g. ‘We fail to reject H0 at the 0.05 significance level.’

7. State your conclusion in practical terms using words, not mathematical nota-
tion or jargon.

• Try to memorize the following letters to remember the seven steps:


HSTRCDC (Hypotheses, Significance Level, Test Statistic, Rejection Rule,
Calculation, Decision, Conclusion)

4.3 Implementing Hypothesis Tests


Hypothesis Test for the Population Mean when Standard Deviation is
Known: Example 1

• We are finally ready for an example!

• A random sample of 12 second-year students enrolled in a mathematics course


was drawn. At the completion of the course, each student was asked how
many hours he or she spent doing homework in mathematics. The data are
listed below. It is known that the population standard deviation is σ = 8.
The lecturer had recommended that students devote three hours per week for
the duration of the 12-week semester, for a total of 36 hours.

31 40 26 30 36 38
29 40 38 30 35 38

1. Conduct a hypothesis test at the 5% significance level to determine whether


there is evidence that the average time spent on homework by students
doing this course is different than the recommended 36 hours. Use the
critical value approach.

55
2. Repeat steps 4, 5 and 6 of the hypothesis test procedure, this time using
the p-value approach.

1. Hypotheses:
H0 : µ = 36
6 36
HA : µ =
2. Significance Level: α = 0.05
Ȳ − µ0
3. Test Statistic: Under H0 , Z = √ has a standard normal distribution
σ/ n
4. Rejection Rule: We will reject H0 if |Zobserved | > zα/2 = z0.025 = 1.96 (this
value we have calculated before in Chapter 4)
5. Calculation:
Ȳ − µ0
Zobserved = √
σ/ n
1
Ȳ = (31 + 40 + 26 + · · · + 38) = 34.25
12
34.25 − 36
Zobserved = √
8/ 12
= −0.758
6. Decision: |Zobserved | = 0.758 < 1.96 therefore we fail to reject H0 at the
0.05 significance level
7. We conclude there is not evidence that the average amount of time spent
doing homework by students in this course is different from 36 hours. (In
other words, it is reasonable to assume that the average amount of time
spent doing homework by students in this course is 36 hours)

• Now repeating steps 4, 5 and 6 using the p-value approach:


4. Rejection Rule: We will reject H0 if p-value < 0.05
5. Calculation: We have already calculated that Zobserved = −0.758; now we
need to determine the p-value
p−value = Pr (Z > |Zobserved | | H0 is true)
= Pr (Z > 0.758| | H0 is true) + Pr (Z < −0.758| | H0 is true)
Recall: if H0 is true, Z has a standard normal distribution
= Pr (Z > 0.76) + Pr (Z < −0.76) to be calculated from Z table
= 2 [1 − Pr (Z < 0.76)]
= 2(1 − 0.7764)
= 0.4472
6. Decision: p-value= 0.4472 > 0.05 thus we fail to reject H0 at the 0.05
significance level
7. Conclusion: same as before

56
Hypothesis Test for the Population Mean when Standard Deviation is
Known: Example 2

• An inspector at a food packaging facility measured the fill volume of a simple


random sample of 100 packets of peanuts that were labeled as containing a net
weight of 12 g each. The sample had mean net weight of 11.95 g. It is known
that the standard deviation of net weight of such peanuts packets is 0.19 g.
Test the null hypothesis that the mean weight of packets of peanuts produced
by this facility equals 12 g against the alternative hypothesis that it does not
equal 12 g. Use 0.02 level of significance. Again, repeat steps 4, 5 and 6 using
the p-value approach.

1. Hypotheses:

H0 : µ = 12
6 12
HA : µ =

2. Significance Level: α = 0.02


Ȳ − µ0
3. Test Statistic: Under H0 , Z = √ has a standard normal distribution
σ/ n
4. Rejection Rule: We will reject H0 if |Zobserved | > zα/2 = z0.01 = 2.326

Pr (Z > z) = 0.01
1 − Pr (Z < z) = 0.01
Pr (Z < z) = 0.99
z = 2.325 ( between 2.32 and 2.33)
Using a computer we can be more exact: 2.326

5. Calculation:
Ȳ − µ0
Zobserved = √
σ/ n
11.95 − 12
= √
0.19/ 100
= −2.632

6. Decision: |Zobserved | = 2.632 > 2.326 therefore we reject H0 at the 0.02


significance level
7. We conclude there is evidence that the mean net weight of packets of
peanuts produced by this facility is different from 12 g.

4. Rejection Rule: We will reject H0 if p-value < 0.02

57
5. Calculation: We have already calculated that Zobserved = 2.632; now we
need to determine the p-value
p−value = Pr (Z > |Zobserved | | H0 is true)
= Pr (Z > 2.632 | H0 is true) + Pr (Z < −2.632 | H0 is true)
Recall: if H0 is true, Z has a standard normal distribution
= Pr (Z > 2.63) + Pr (Z < −2.63) to be calculated from Z table
= 2 [1 − Pr (Z < 2.63)]
= 2(1 − 0.9957)
= 0.0085

6. Decision: p-value= 0.0085 < 0.02 thus we reject H0 at the 0.02 signifi-
cance level
7. Conclusion: same as before

One-Tailed Hypothesis Tests


• Until now we have only been dealing with the two-tailed hypothesis test in
which the alternative hypothesis is µ 6= µ0
• However, often we may only be interested in whether the parameter differs
from the null value on one side
• For instance, in the peanuts example we just did, we may be interested in
whether the mean net weight equals 12 g or is less than 12 g (in which
case the company could be accused of false advertising). There is not much
interest in whether the weight would be greater than 12 g because in that case
no inspectors or customers are going to complain about the product
• Hence, one can also conduct a one-tailed test which only looks at one side of
the distribution
• In this case we do not divide α by 2 when determining the critical value; nor
do we use absolute value in our p-value calculations
• Let us consider a couple of examples to illustrate
One-Tailed Hypothesis Test: Example 1
• The medical director of a large company is concerned about the effects of work
stress on the company’s younger executives. According to national estimates,
the mean systolic blood pressure for males aged 35 to 44 years is 128 and the
standard deviation of this population is 15. The medical director examines
the records of 72 male executives in this age group and finds that their mean
systolic blood pressure is 129.93. Is this evidence (at the 0.05 significance level)
that the mean blood pressure for all the company’s young male executives is
higher than the national average? (We will assume that the standard deviation
of blood pressure of all the company’s young executives equals the national
value of 15).

58
1. Hypotheses:

H0 : µ = 128
HA : µ > 128

Note how the hypotheses have changed: we are only interested in whether
we can prove that the average blood pressure of this company’s young
executives exceeds the national average. Hence we ignore the possibility
that the average blood pressure is less than the national average. Accord-
ingly, our alternative hypothesis uses a > sign rather than a 6= sign. This
is called a ‘right-tailed test’. The following graph shows the distribution
of the test statistic in this case, along with the rejection region:

2. Significance Level: α = 0.05


Ȳ − µ0
3. Test Statistic: Under H0 , Z = √ has a standard normal distribution
σ/ n

59
4. Rejection Rule: We will reject H0 if Zobserved > zα = z0.05 = 1.645 (note
we no longer have absolute value since we are only interested in positive
values of Z)

Pr (Z > z) = 0.05
1 − Pr (Z < z) = 0.05
Pr (Z < z) = 0.95
z = 1.645 ( between 1.64 and 1.65)

5. Calculation:
Ȳ − µ0
Zobserved = √
σ/ n
129.93 − 128
= √
15/ 72
= 1.09

6. Decision: Zobserved = 1.09 < 1.645 therefore we fail to reject H0 at the


0.05 significance level
7. We conclude there is no evidence that the mean blood pressure of young
male executives at this company exceeds the national average.

• Note: if using the p-value approach, we would find Pr (Z > Zobserved ) = Pr (Z > 1.09)
which is 0.138. Because the p-value > α we fail to reject H0 .

One-Tailed Hypothesis Test: Example 2

• A salesperson at a call centre who is selling insurance policies must sell at least
60 policies per year in order to be profitable to the company. Tony has been
working at the company for ten years and the number of policies he has sold
each year are given below. If we treat his past ten years as a random sample
of all the years he might do this job, do we have evidence at the 0.01 level that
Tony is an underperforming salesperson (meaning that on average he fails to
sell 60 policies per year)? Assume it is known that the standard deviation of
the number of policies sold annually by any salesperson is 8.

1. Hypotheses:

H0 : µ = 60
HA : µ < 60

54 67 47 43 61
50 55 41 56 64

60
This time, because we are specifically interested in whether Tony is un-
derperforming, we have a ‘left-tailed test’. Our alternative hypothesis
uses a < sign. The following graph shows the distribution of the test
statistic in this case, along with the rejection region:
2. Significance Level: α = 0.01
Ȳ − µ0
3. Test Statistic: Under H0 , Z = √ has a standard normal distribution
σ/ n
4. Rejection Rule: We will reject H0 if Zobserved < −zα = −z0.01 = −2.326
(note we no longer have absolute value since we are only interested in
positive values of Z)

Pr (Z > z) = 0.01
1 − Pr (Z < z) = 0.01
Pr (Z < z) = 0.99
z = 2.325 ( between 2.32 and 2.33)
Using a computer we can be more exact: 2.326

5. Calculation:
1
Ȳ = (54 + 67 + 47 + 43 + 61 + 50 + 55 + 41 + 56 + 64) = 53.8
10
Ȳ − µ0
Zobserved = √
σ/ n
53.8 − 60
= √
8/ 10
= −2.450

61
6. Decision: Zobserved = −2.450 < −2.326 therefore we reject H0 at the 0.01
significance level
7. We conclude there is evidence that Tony is underperforming since his
annual average number of policies sold is less than 60.

• Note: if using the p-value approach, we would find Pr (Z < Zobserved ) = Pr (Z < −2.45)
which is 0.007. Because the p-value < α we reject H0 .

4.4 Power of a Hypothesis Test (Optional)


Definition of Power

• The power of a statistical hypothesis test is a measure of its ability to provide


evidence for a claim

• Power is defined as Pr (Reject H0 | H0 is false)

• Since the Type II Error β = Pr (Fail to Reject H0 |H0 is False), power= 1 − β

• In order to calculate power for a given hypothesis testing problem, we must


specify an alternative value of the parameter being tested

• In other words, it is not enough to say HA : µ > µ0 (for example, in a one-


sample test for the mean); we must specify a value µA that the population
mean could be in an alternative scenario

• It is useful to create a power function for a hypothesis test, which expresses


the power as a function of the parameter (e.g. µA ), the sample size n, and the
significance level α

• The power function is useful for deciding the sample size, i.e. how much data
we need, before we collect our data

Power of a Hypothesis Test: Example

• Suppose we have a normally distributed population with known standard de-


viation σ = 1 and we want to test the null hypothesis µ = 4 against the
alternative µ > 4 at 0.05 significance level

• Determine the power of this hypothesis test if the sample size is n = 10 and

62
the actual population mean is µ = 5

1 − β = Pr (Reject H0 |µ = 5)
= Pr (Z > zα | µ = 5)
 
Ȳ − µ0
= Pr √ > zα | µ = 5
σ/ n
Under H0, this statistic has a standard normal distribution
But under HA, it does not!

Under HA, Ȳ has a normal distribution with mean µ = 5 and s.d. σ/ n
 
Ȳ − µA − (µ0 − µA )
= Pr √ > zα | µ = 5
σ/ n
 
Ȳ − µA µ0 − µA
= Pr √ − √ > zα | µ = 5
σ/ n σ/ n
 
Ȳ − µA µ0 − µA
= Pr √ > zα + √ |µ=5
σ/ n σ/ n
 
µ0 − µA
= Pr Z > zα + √
σ/ n
(since we now have a correct sampling distribution on the left)
 
4−5
= Pr Z > z0.05 + √
1/ 10
 
4−5
= Pr Z > 1.645 + √
1/ 10
= Pr (Z > 1.645 − 3.162) = Pr (Z > −1.517)
= 0.935

• This means that with a sample size of 10 we have a 93.5% chance of rejecting
H0 and demonstrating that the mean (which is actually 5) is greater than 4,
at 0.05 significance level
• The graph below shows the power function of this test with α = 0.05 for
different values of µA and n
• From the graph it is clear that as sample size increases, power increases
• It is also clear that as the true population mean µA moves further away from
the null hypothesis value µ0 (which is 4 in this case), power increases

Power of a Hypothesis Test: Exercises

• Suppose we are testing the null hypothesis µ = 10 against the alternative


µ < 10 for a normally distributed population with known standard deviation
σ = 3 using a sample of size n = 6. The test is conducted at α = 0.01
significance level. Determine the power of the hypothesis test if the mean of
this population is actually 8.

63
5 Hypothesis Tests about One Population
Review

• In the previous chapter we covered basic theory of hypothesis tests as well as


the specific case of a hypothesis test for the population mean when standard
deviation is known

• We now look at two other kinds of hypothesis test concerning a single popu-
lation: a hypothesis test for the population mean when standard deviation is
unknown, and a hypothesis test for the population proportion

5.1 Hypothesis Test for the Population Mean when Stan-


dard Deviation is Unknown

64
• It is often unrealistic to assume the value of σ, the population standard devi-
ation, is known.

• So what do we do if our data comes from a normally distributed population


with unknown standard deviation?

• Just as we did in the previous chapter when producing a confidence interval for
the mean in such a case, we estimate the standard deviation using the sample.

• But this means our test statistic will no longer be normally distributed but t
distributed
Ȳ − µ Ȳ − µ
• Our test statistic changes from Z = √ to t = √ , which has a t
σ/ n S/ n
distribution with n − 1 degrees of freedom

• We follow the same seven-step procedure to conduct a hypothesis test. The


difference is that when determining the rejection rule (using a critical value or
a p-value) we will use the t distribution rather than the Z distribution

• We can get t distribution critical values from our t distribution table but we
cannot get t distribution p-values from the table. These cannot be done by
hand; we need a computer to get them

Hypothesis Test for the Population Mean when Standard Deviation is


Unknown: Example 1

• The following are the monthly rand amounts for landline telephone service
for a random sample of eight households in your community: 550, 510, 470,
620, 480, 560, 600, 570. The telephone company claims in an advertisement
that customers in this community are paying an average of R490 per month for
their landline telephone service. Conduct a hypothesis test at 0.05 significance
level to determine whether this claim is accurate.

1. Hypotheses:

H0 : µ = 490
6 490
HA : µ =

Because we did not specify whether we want to show the mean monthly
cost is less than or greater than R490, we use a two-tailed test.
2. Significance Level: α = 0.05
Ȳ − µ0
3. Test Statistic: Under H0 , t = √ has a t distribution with n − 1 = 7
S/ n
degrees of freedom
4. Rejection Rule: We will reject H0 if |tobserved | > tα/2,n−1 = t0.025,7 = 2.365

65
5. Calculation:
1
Ȳ = (550 + 510 + 470 + 620 + 480 + 560 + 600 + 570) = 545
8
v
u
u 1 X 8
2
S= t Yi − Ȳ
n − 1 i=1
r
1
= (20600) = 54.2481
7
Ȳ − µ0
tobserved = √
S/ n
545 − 490
= √
54.2481/ 8
= 2.868

6. Decision: |tobserved | = 2.868 > 2.365 therefore we reject H0 at the 0.05


significance level
7. We conclude there is evidence that the mean monthly cost of landline
telephone services in this community is different from what the company
claims it is.
• Note: if using the p-value approach, we would find Pr (t > tobserved )+Pr (t < −tobserved ) =
2 Pr (t > tobserved ) = 2 Pr (t > 2.868). It is not possible to find this value by
hand or from the t distribution table. But using a computer we can determine
it to be 0.024. Since it is less than 0.05 this confirms that we were correct to
reject the null hypothesis.

Hypothesis Test for the Population Mean when Standard Deviation is


Unknown: Example 2

• With reference to the previous example, suppose that we were interested specif-
ically in whether the average cost for landline telephone service in this com-
munity is greater than the R490 that the company claims it is. How would
the hypothesis test change in this case?
1. Hypotheses:

H0 : µ = 490
HA : µ > 490

Because we specified we want to show the mean monthly cost is greater


than R490, we use a right-tailed test.
2. Significance Level: α = 0.05
Ȳ − µ0
3. Test Statistic: Under H0 , t = √ has a t distribution with n − 1 = 7
S/ n
degrees of freedom

66
4. Rejection Rule: We will reject H0 if tobserved > tα,n−1 = t0.05,7 = 1.895
5. Calculation: tobserved = 2.868 (As before)
6. Decision: tobserved = 2.868 > 1.895 therefore we reject H0 at the 0.05
significance level
7. We conclude there is evidence that the mean monthly cost of landline
telephone services in this community is greater than what the company
claims it is.

• Note: if using the p-value approach, we would find Pr (t > tobserved ) = Pr (t > 2.868).
It is not possible to find this value by hand or from the t distribution table.
But using a computer we can determine it to be 0.012. Since it is less than
0.05 this confirms that we were correct to reject the null hypothesis.

Hypothesis Test for the Population Mean: Exercises

• Recently many companies have been experimenting with ‘telecommuting’, al-


lowing employees to work at home on their computers. Among other things,
telecommuting is supposed to reduce the number of sick days taken. Suppose
that at one firm, it is known that over the past few years employees have taken
a mean of 5.4 sick days. This year, the firm introduces telecommuting. Man-
agement chooses a simple random sample of 80 employees to follow in detail,
and, at the end of the year, these employees average 4.5 sick days with a stan-
dard deviation of 2.7 days. Conduct a hypothesis test at the 5% significance
level to determine whether the mean number of sick days taken by employees
has decreased.

• In a process that manufactures tungsten-coated silicon wafers, the target re-


sistance for a wafer is 85 mΩ, and it is known that the standard deviation is
0.5 mΩ. In a simple random sample of 50 wafers, the sample mean resistance
was 84.8 mΩ. Test the null hypothesis that the mean resistance of this process
is equal to the target value. Use the critical value approach and the p-value
approach.

5.2 Hypothesis Test for the Population Proportion

• Often we may have a hypothetical value in mind for a population proportion


that we want to test empirically (empirically means ‘using data or evidence’).
How do we do this?

• Now our hypotheses are about p rather than µ since the unknown parameter
about which we are making inferences is the proportion, not the mean

• So our null hypothesis will in general take the form p = p0 , while the alternative
hypothesis will be p 6= p0 , p < p0 or p > p0 (depending whether it is a two-
tailed, left-tailed or right-tailed test)

67
• We use our usual seven-step hypothesis testing procedure, but the test statis-
p̂ − p0
tic will change to Z = r , which under the null hypothesis has a
p0 (1 − p0 )
n
standard normal distribution (approximately) if n is sufficiently large

• Notice that unlike confidence intervals for the proportion, we do not need to
introduce the t distribution because rour formula has the exact standard devia-
p0 (1 − p0 )
tion of p̂ rather than an estimator. is the exact standard deviation
n
of p̂ if the null hypothesis is true (which means the population parameter p
has a value of p0 ).

Hypothesis Test for the Population Proportion: Example

• A coin is to be used to determine which team will defend which goal in the
first half of a soccer game. However, the captain of one team claims that the
coin is not fair (i.e. the probability of heads is not 50%). The referee does an
experiment by flipping the coin 60 times, and gets heads 37 times and tails
23 times. Is the captain justified in claiming that the coin is unfair? Use a
hypothesis test at the 5% significance level to answer this question.

1. Hypotheses:

H0 : p = 0.5
6 0.5
HA : p =

2. Significance Level: α = 0.05


p̂ − p0
3. Test Statistic: Under H0 , Z = r has a standard normal dis-
p0 (1 − p0 )
n
tribution
4. Rejection Rule: We will reject H0 if |Zobserved | > zα/2 = z0.025 = 1.96
5. Calculation:

p̂ = 37/60 = 0.6167
p̂ − p0
Zobserved = r
p0 (1 − p0 )
n
0.6167 − 0.5
=r
0.5(1 − 0.5)
60
= 1.808

6. Decision: |Zobserved | = 1.808 < 1.96 therefore we fail to reject H0 at the


0.05 significance level

68
7. We conclude there is insufficient evidence to claim that the coin is unfair.
• Using the p-value approach:
4. Rejection Rule: We will reject H0 if p-value < 0.05
5. Calculation: We have already calculated that Zobserved = 1.808; now we
need to determine the p-value
p−value = Pr (Z > |Zobserved | | H0 is true)
= Pr (Z > 1.808 | H0 is true) + Pr (Z < −1.808 | H0 is true)
Recall: if H0 is true, Z has a standard normal distribution
= Pr (Z > 1.81) + Pr (Z < −1.81) to be calculated from Z table
= 2 [1 − Pr (Z < 1.81)]
= 2(1 − 0.9649)
= 0.0702

6. Decision: p-value= 0.0702 > 0.05 thus we fail to reject H0 at the 0.05
significance level
7. Conclusion: same as before

Hypothesis Test for the Population Proportion: Exercises

• Dogs are big and expensive. Rats are small and cheap. Can rats be trained to
replace dogs in sniffing out illegal drugs? One measure of the performance of
a drug-sniffing animal is the number of times in 80 trials that it can correctly
distinguish a cup with cocaine residue in it from within a group of other cups.
Suppose it is known that dogs are in general successful 98% of the time in
this test. A rat undergoes the 80 trials and successfully sniffs out the cup
with cocaine residue 73 times. The scientist conducting the experiment claims
that the rat is as good as a drug sniffing dog (i.e. it has the same probability
of success). Can you prove that the rat is less effective than dogs? Use a
hypothesis test with 0.02 significance level.
• A pastor claims that his sermons are less than 30 minutes long three-quarters
of the time. Unknown to him, a member of the church congregation times
his sermons at a random sample of 40 Sunday services. Of these 40 sermons,
26 are less than 30 minutes long. Is the pastor’s claim regarding his sermon
length reasonable? Use a hypothesis test with 0.05 significance level.

6 Hypothesis Tests for Comparing Two Popula-


tions
6.1 Hypothesis Test for Comparing Means of Two Popula-
tions using Unpaired Samples
Comparing Means of Two Populations using Unpaired Samples

69
Population Population Population Sample Sample Sample St.
Mean St. Dev. Mean Dev.
1 µ1 σ1 X1 , X2 , . . . , Xn1 X̄ S1
2 µ2 σ2 Y1 , Y2 , . . . , Yn2 Ȳ S2

• Consider two normally distributed populations described in the table below


• We can draw independent random samples from the two populations and use
this data to test hypotheses about the difference between µ1 and µ2
• Most commonly we are interested in testing whether µ1 − µ2 = 0, i.e. µ1 = µ2 ,
against some alternative hypothesis (either two-tailed, µ1 6= µ2 , or one-tailed,
µ1 > µ2 or µ1 < µ2 )
• As with confidence intervals, the exact procedure will depend on whether the
population standard deviations are known or unknown and equal or unequal

Hypothesis Test for Comparing Means of Two Normally Distributed Pop-


ulations with Known Variances

• If we know σ1 and σ2 , we can test the null hypothesis H0 : µ1 −µ2 = ∆0 against


a one- or two-tailed alternative using the following test statistic:
X̄ − Ȳ − ∆0
Z= r 2
σ1 σ22
+
n1 n2

• This Z follows a standard normal distribution under H0


• In most cases, we are testing the null hypothesis µ1 = µ2 so ∆0 = 0 and can
be dropped from the formula
• Otherwise the procedure is exactly as in the one population case

Hypothesis Test for Comparing Means of Two Normally Distributed Pop-


ulations with Known Variances: Example

• To determine the effect of fuel grade on fuel efficiency, 80 new cars of the same
make, with identical engines, were each driven for 1000 km. Forty of the cars
ran on regular fuel and the other 40 received premium grade fuel. The cars
with the regular fuel averaged 11.6 km/L and the cars with the premium fuel
averaged 11.9 km/L. It is known that the population standard deviation of
fuel efficiency is 0.5 for cars running on regular fuel and 0.9 for cars running
on premium fuel. Is there a difference in fuel efficiency between the two grades
of fuel? Use 1% significance level.
1. Hypotheses:
H0 : µ1 = µ2
6 µ2
HA : µ1 =

70
2. Significance Level: α = 0.01
X̄ − Ȳ − 0
3. Test Statistic: Under H0 , Z = r 2 has a standard normal distri-
σ1 σ22
+
n1 n2
bution
4. Rejection Rule: We will reject H0 if |Zobserved | > zα/2 = z0.005 = 2.575
5. Calculation:
11.6 − 11.9 − 0
Zobserved = r
0.52 0.92
+
40 40
= −1.843

6. Decision: |Zobserved | = 1.843 < 2.575 therefore we fail to reject H0 at the


0.01 significance level
7. We conclude there is no statistically significant difference in fuel efficiency
between the two grades of fuel.

• Using the p-value approach:

4. Rejection Rule: We will reject H0 if p-value < 0.01


5. Calculation: We have already calculated that Zobserved = −1.84; now we
need to determine the p-value

p−value = Pr (Z > |Zobserved | | H0 is true)


= Pr (Z > 1.84 | H0 is true) + Pr (Z < −1.84 | H0 is true)
Recall: if H0 is true, Z has a standard normal distribution
= Pr (Z > 1.84) + Pr (Z < −1.84) to be calculated from Z table
= 2 [1 − Pr (Z < 1.84)]
= 2(1 − 0.9671)
= 0.0658

6. Decision: p-value= 0.0658 > 0.01 thus we fail to reject H0 at the 0.01
significance level
7. Conclusion: same as before

Hypothesis Test for Comparing Means of Two Normally Distributed Pop-


ulations with Unknown but Equal Variances

• In chapter 4 we constructed a confidence interval estimator for the difference


in means between two populations with unknown but equal variances

• Hence we assume that σ12 = σ22 = σ 2 although we don’t know the value of σ 2
(n1 − 1)S12 + (n2 − 1)S22
• We introduced a pooled estimator for the population variance: Sp2 =
n1 + n2 − 2

71
• We can involve the same estimator to derive a test statistic for testing the null
hypothesis µ1 − µ2 = ∆0 against some alternative:

X̄ − Ȳ − ∆0
t= r
1 1
Sp +
n1 n2

• Under H0 , this test statistic follows a t distribution with n1 + n2 − 2 degrees


of freedom

Hypothesis Test for Comparing Means of Two Normally Distributed Pop-


ulations with Unknown but Equal Variances: Example

• One indicator of the financial health of a company is the ratio of current


assets to current liabilities. Roughly speaking this means the amount the
company is worth divided by the amount the company owes. The following
table shows this ratio for a random sample of 68 healthy companies and 33
failed companies. Test the null hypothesis that the asset to liability ratio in

Ratio of current assets to current liabilities


Healthy Companies Failed Companies
1.50 0.10 1.76 1.14 1.84 2.21 0.82 0.89 1.31
2.08 1.43 0.68 3.15 1.24 2.03 0.05 0.83 0.90
2.23 2.50 2.02 1.44 1.39 1.64 1.68 0.99 0.62
0.89 0.23 1.20 2.16 1.80 1.87 0.91 0.52 1.45
1.91 1.67 1.87 1.21 2.05 1.06 1.16 1.32 1.17
0.93 2.17 2.61 3.05 1.52 1.93 0.42 0.48 0.93
1.95 2.61 1.11 0.95 0.96 2.25 0.88 1.10 0.23
2.73 1.56 2.73 0.90 2.12 1.42 1.11 0.19 0.13
1.62 1.76 2.22 2.80 1.85 0.96 2.03 0.51 1.12
1.71 1.02 2.50 1.55 1.69 1.64 0.92 0.26 1.15
1.03 1.80 0.67 2.44 2.30 2.21 0.13 0.88 0.09
1.96 1.81

healthy companies is the same as in failed companies against the alternative


hypothesis that healthy companies have a higher asset to liability ratio than
failed companies. Use the 0.05 significance level.

1. Hypotheses:

H0 : µ1 = µ2
HA : µ1 > µ2

2. Significance Level: α = 0.05

72
X̄ − Ȳ − 0
3. Test Statistic: Under H0 , t = r has a t distribution with
1 1
Sp +
n1 n2
n1 + n2 − 2 degrees of freedom
4. Rejection Rule: We will reject H0 if tobserved > tα,n1 +n2 −2 = t0.05,68+33−2 =
t0.05,99 ≈ 1.660
5. Calculation:
1
X̄ = (1.50 + 0.10 + 1.76 + · · · + 1.96 + 1.81) = 1.7256
68
1
Ȳ = (0.82 + 0.89 + 1.31 + · · · + 0.89 + 0.09) = 0.8236
33
68
2 1 X 2
S1 = Xi − X̄ = 0.4087
68 − 1 i=1
33
1 X 2
S22 = Yi − Ȳ = 0.2314
33 − 1 i=1
(68 − 1)(0.4087) + (33 − 1)(0.2314) √
r
Sp = = 0.3514 = 0.5928
68 + 33 − 2
1.7256 − 0.8236 − 0
tobserved = r
1 1
0.5928 +
68 33
= 7.172

6. Decision: tobserved = 7.172 > 1.660 therefore we reject H0 at the 0.05


significance level
7. We conclude that the mean asset to liability ratio of healthy companies
is greater than that of failed companies.

• Note that we cannot calculate a p-value by hand in this case because we are
using the t distribution

Hypothesis Test for Comparing Means of Two Normally Distributed Pop-


ulations with Unknown, Unequal Variances

• If σ12 and σ22 are unknown and we cannot assume them to be equal, our test
statistic is
X̄ − Ȳ − ∆0
t= r 2
S1 S2
+ 2
n1 n2
• Under H0 , this test statistic approximately follows a t distribution with ν
 2 2
S1 S22
+
n1 n2
degrees of freedom where ν = 4
S1 S24
+
n21 (n1 − 1) n22 (n2 − 1)

73
• This is known as the Welch-Satterthwaite Method and is typically used by
statistical software when conducting a hypothesis test comparing means of
two normally distributed populations with unknown, unequal variances

• Since degrees of freedom for a t distribution must be an integer, we round ν


to the nearest integer

Hypothesis Test for Comparing Means of Two Normally Distributed Pop-


ulations with Unknown, Unequal Variances: Example

• A horticulturalist is interested in whether the mean height differs between two


varieties of bean plants known as Maris and Stella respectively

• She measures the height (in cm) of a random sample of 28 Maris plants and
a random sample of 24 Stella plants; the data is given in the table below
Conduct a hypothesis test at 1% significance level to determine whether there

Height (cm) of Bean Plants by Variety


Maris Variety Stella Variety
92 88 98 95
90 91 97 85
90 89 102 106
86 86 103 104
110 90 113 95
101 97 99 97
91 91 101 97
104 91 108 98
98 92 112 102
86 94 113 101
98 88 109
92 97 101
99 96 98
105 87 103

is a difference in mean height between the two plant varieties.

1. Hypotheses:

H0 : µ1 = µ2
6 µ2
HA : µ1 =

2. Significance Level: α = 0.01

74
X̄ − Ȳ − 0
3. Test Statistic: Under H0 , t = r 2 has a t distribution with ν
S1 S22
+
n1 n2
degrees of freedom, where
1
X̄ = (92 + 90 + 90 + · · · + 96 + 87) = 93.53571
28
1
Ȳ = (98 + 97 + 102 + · · · + 102 + 101) = 101.5417
24
28
2 1 X 2
S1 = Xi − X̄ = 38.25794
28 − 1 i=1
24
1 X 2
S22
= Yi − Ȳ = 41.99819
24 − 1 i=1
 2 2
S1 S22
+
n1 n2
ν= 4
S1 S24
+
n21 (n1 − 1) n22 (n2 − 1)
 2
38.25794 41.99819
+
28 24
= 2
38.25794 41.998192
+
282 (28 − 1) 242 (24 − 1)
= 48.007 ⇒ 48

4. Rejection Rule: We will reject H0 if |tobserved | > tα/2,ν = t0.005,48 =≈ 2.678


(closest degrees of freedom in table is 50; exact value for 48 degrees of
freedom is 2.682)
5. Calculation:
93.53571 − 101.5417 − 0
tobserved = r
38.25794 41.99819
+
28 24
= −4.535

6. Decision: |tobserved | = 4.535 > 2.678 therefore we reject H0 at the 0.01


significance level
7. We conclude that there is a difference in mean height between the two
bean plant varieties

75
6.2 Hypothesis Test for Comparing Means of Two Popula-
tions using Paired Samples
Hypothesis Test for Comparing Means of Two Normally Distributed Pop-
ulations: Paired Samples

• A paired t-test is a special case where we are comparing the means of two
normally distributed populations using related samples

• This means that the two samples have the same sample size and also that each
observation from the first sample has a ‘partner’ observation in the second
sample with which it can be directly compared

• A classic application of this kind of test would be a before/after comparison


on the same sample of individuals

• For example, suppose we take a random sample of 10 matric maths learners.


We give them a maths test and record their scores. Then they participate in a
special online tutoring programme and afterward they write another test and
we again record their scores. We would like to know if the online tutoring
programme was effective, i.e. if there was an improvement in their scores. In
this case we can directly compare each student’s ‘before’ test score to the same
student’s ‘after’ test score, so we have a paired sample

• The procedure in this case is to subtract each Xi value minus the corresponding
Yi value to get difference value di

• Then we conduct a one-sample t test (i.e. a test concerning the mean of a


single population) on these differences

• Note that the paired t test rests on the assumption that the two populations
have equal variances

Hypothesis Test for Comparing Means of Two Normally Distributed Pop-


ulations: Paired Samples - Example

• A non-profit organisation provides computers and WiFi to remote rural schools


and conducts an online tutoring programme to matric maths learners. They
are interested in studying the effectiveness of this programme. Hence they give
a maths test to a random sample of 10 learners before the programme starts,
and then give a maths test of equal difficulty to the same 10 learners after the
programme finishes. The results of the tests are displayed in the table below.
Test whether mean test performance among matric maths learners is better
after the programme than before, at 1% significance level.

1. Hypotheses:

H0 : µ1 = µ2
HA : µ1 < µ2

76
Xi Yi di
52 66 -14
47 52 -5
71 68 3
65 86 -21
55 58 -3
62 70 -8
39 51 -12
44 49 -5
74 82 -8
59 82 -23

2. Significance Level: α = 0.01


d¯ − 0
3. Test Statistic: Under H0 , t = √ has a t distribution with n − 1 = 9
Sd / n
degrees of freedom
4. Rejection Rule: We will reject H0 if tobserved < −tα,n−1 = t0.01,9 = −2.821
5. Calculation:
1
d¯ = ((−14) + (−5) + 3 + · · · + (−23)) = −9.6
10
v
u
u 1 X 10
2
Sd = t di − d¯
10 − 1 i=1
r
1
= (584.4) = 8.0581
9
d¯ − 0
tobserved = √
Sd / n
−9.6 − 0
= √
8.0581/ 10
= −3.767

6. Decision: tobserved = −3.767 < −2.821 therefore we reject H0 at the 0.01


significance level
7. We conclude there is evidence that mean matric maths test scores are
higher after the tutoring programme than before.

Hypothesis Tests for Comparing Means of Two Populations: Exercises

• Fifty specimens of a new computer chip were tested for speed in a certain
application, along with 50 specimens of chips with the old design. The average

77
Weight Before Weight After
72 68
56 49
93 87
84 85
66 63
69 72
74 67
58 53
60 58

speed, in MHz, for the new chips was 495.6, and the average speed for the old
chips was 481.2. It is assumed that in the whole population of new chips, the
standard deviation of speed is 19.4, while in the whole population of old chips,
the standard deviation of speed is 14.3. Can you conclude that the mean speed
for the new chips is greater than that of the old chips? Conduct a hypothesis
test at 10% significance level to answer this, using the p-value approach.

• Two methods are being considered for a paint manufacturing process, in order
to increase production. In a random sample of 100 days, the mean daily
production using the first method was 625 tonnes and the standard deviation
was 40 tonnes. In a random sample of 64 days, the mean daily production
using the second method was 640 tonnes and the standard deviation was 50
tonnes. Assume the standard deviations of the two populations are equal. Do
we have evidence at the 5% significance level that the first method has slower
production than the second method?

• The manufacturers of a dietary supplement claim that people who take it will
lose weight. A random sample of nine women are weighed before taking the
supplement and again after 30 days of taking the supplement. Their before and
after weights (in kg) are shown in the table below. Test at the 5% significance
level whether the mean weight of women is less after taking the supplement
than before.

6.3 Hypothesis Test for Comparing Proportions of Two Pop-


ulations

• Just as we developed a confidence interval for the difference between two pro-
portions, we can also use hypothesis testing to make inferences about the
difference between proportions of two populations

78
Population Sample Size # of Label Users
1 (Women) 296 63
2 (Men) 251 27

• In this case our null hypothesis will be H0 : p1 = p2 against the alternative HA :


p1 6= p2 (or p1 < p2 or p1 > p2 if we are doing a one-tailed test)
p̂1 − p̂2 X
• The test statistic is Z = s   where p̂1 = n is the sample
1 1 1
p̂(1 − p̂) +
n1 n2
Y
proportion for the first sample, p̂2 = is the sample proportion from the
n2
X +Y
second sample, and p̂ = is a pooled estimator of p = p1 = p2 , the
n1 + n2
equal proportion value assumed under the null hypothesis

• Under H0 , the statistic Z approximately follows a standard normal distribution


provided n1 and n2 are sufficiently large (each greater than 30)

Hypothesis Test for Comparing Proportions of Two Populations: Exam-


ple

• The table below presents survey data on whether consumers are ‘label users’
who pay attention to details on the label when buying a garment. Are men
and women equally likely to be label users? Test the hypothesis that the
proportion of women who are label users is the same as the proportion of men
who are label users. Use α = 0.05.

1. Hypotheses:

H0 : p1 = p2
6 p2
HA : p1 =

2. Significance Level: α = 0.05


p̂1 − p̂2
3. Test Statistic: Under H0 , Z = s   has a standard
1 1
p̂(1 − p̂) +
n1 n2
normal distribution
4. Rejection Rule: We will reject H0 if |Zobserved | > zα/2 = z0.025 = 1.96

79
5. Calculation:
p̂1 = 63/296 = 0.2128
p̂2 = 27/251 = 0.1076
63 + 27 90
p̂ = = = 0.1645
296 + 251 547
0.2128 − 0.1076
Zobserved = s  
1 1
0.1645(1 − 0.1645) +
296 251
= 3.307

6. Decision: |Zobserved | = 3.307 > 1.96 therefore we reject H0 at the 0.05


significance level
7. We conclude that women and men are not equally likely to be ‘label users’
when buying clothes.
• Using the p-value approach:
4. Rejection Rule: We will reject H0 if p-value < 0.05
5. Calculation: We have already calculated that Zobserved = 3.307; now we
need to determine the p-value
p−value = Pr (Z > |Zobserved | | H0 is true)
= Pr (Z > 3.307 | H0 is true) + Pr (Z < −3.307 | H0 is true)
Recall: if H0 is true, Z has a standard normal distribution
= Pr (Z > 3.31) + Pr (Z < −3.31) to be calculated from Z table
= 2 [1 − Pr (Z < 3.31)]
= 2(1 − 0.9995)
= 0.001

6. Decision: p-value= 0.001 < 0.05 thus we reject H0 at the 0.05 significance
level
7. Conclusion: same as before
Hypothesis Test for Comparing Proportions of Two Populations: Exer-
cises
• It’s difficult to persuade consumers to abandon a product with which they
are familiar. One experiment gave consumers free samples of a new washing
powder and also of a standard washing powder. After some time, subjects
were asked which washing powder they prefer. Among the 48 customers who
normally use the standard product, 19 preferred the new product. Among the
56 customers who did not previously use the standard product, 29 preferred
the new product. Are current users of the standard washing powder less likely
than nonusers to prefer the new washing powder? Summarize the data and
conduct a hypothesis test at 0.01 significance level.

80
• Two extrusion machines that manufacture steel rods are being compared. In
a sample of 1000 rods taken from machine 1, 960 met specifications regarding
length and diameter. In a sample of 600 rods taken from machine 2, 582 met
the specifications. Are the two machines equally effective at producing rods
that meet the specifications? Conduct a hypothesis test at 0.05 significance
level to reach a conclusion.

6.4 Hypothesis Tests concerning Population Variance(s)


Hypothesis Test for a Population Variance (Optional)

• Although not as widely used as some other tests, one can conduct a hypothesis
test to infer whether the variance of a normally distributed population is equal
to some specified null value

• The null hypothesis H0 is σ 2 = σ02 and the alternative HA is either HA : σ 2 6= σ02 ,


σ 2 < σ02 or σ 2 > σ02 (depending whether we are conducting a two-tailed, left-
tailed or right-tailed test)
(n − 1)S 2
• The test statistic is T =
σ02
• Under H0 , the statistic T follows a χ2 distribution (pronounced ‘kie-squared’)
with n − 1 degrees of freedom

• This is a new distribution we have not seen before!

• A random variable with a χ2 distribution can take values from 0 to ∞

• (Actually, if you take a random variable with a standard normal distribution


and square it, you will get a random variable with a χ2 distribution with 1
degree of freedom)

• Another important feature of the χ2 distribution is that it is not symmetrical

• Thus, when doing a two-tailed test, we cannot just look up one critical value
in the table and consider positive and negative cases. We must look up two
critical values: one for the left tail and one for the right tail

• The distribution is shown in the graph

• Hence with a two-tailed test we will reject H0 if Tobserved < χ21−α/2,n−1 or


Tobserved > χ2α/2,n−1

• For a left-tailed test we will reject H0 if Tobserved < χ21−α,n−1

• For a right-tailed test we will reject H0 if Tobserved > χ2α,n−1

81
Hypothesis Test for a Population Variance: Example
• A company produces machined engine parts that are supposed to have a di-
ameter variance no larger than 0.0002 (diameters measured in cm). A random
sample of 10 parts gave a sample variance of 0.0003. Test, at the 5% level, for
evidence that the variance exceeds 0.0002.
1. Hypotheses:
H0 : σ 2 = 0.0002
HA : σ 2 > 0.0002
2. Significance Level: α = 0.05
(n − 1)S 2
3. Test Statistic: Under H0 , T = 2
has a χ2 distribution with n − 1
σ0
degrees of freedom
4. Rejection Rule: We will reject H0 if Tobserved > χ2α,n−1 = χ0.05,9 = 16.919
5. Calculation:
(n − 1)S 2
T =
σ02
(10 − 1)(0.0003)
= = 13.5
0.0002
6. Decision: Tobserved = 13.5 < 16.919 therefore we fail to reject H0 at the
0.05 significance level
7. There is insufficient evidence to conclude that the variance is larger than
0.0002.

82
Hypothesis Test for Comparing Variances of Two Populations (Optional)

• It is also possible to conduct a hypothesis test to compare the variances of two


normally distributed populations using data from two independent random
samples

• This is not an optional topic because this type of test will become very im-
portant in second year Statistics, so it is important to understand the basics
now

• Suppose we have an independent random sample X1 , X2 , . . . , Xn1 from nor-


mally distributed population 1 and an independent random sample Y1 , Y2 , . . . , Yn2
from normally distributed population 2. Suppose further that the variance of
population 1 is σ12 and the variance of population 2 is σ22
σ12
• Our null hypothesis is H0 : σ12 = σ22 which we can also write as 2 = 1
σ2
• The reason why we write it like this is that the test statistic we are going to
use is related to the ratio of two variances rather than the difference
σ12
• We will use the right-tailed alternative hypothesis HA : >1
σ22
• Note that it is possible to do a two-tailed or left-tailed alternative hypothesis,
but we will focus only on the right-tailed alternative because that is all we are
going to use in second year

• It can be shown that if S12 is the sample variance of an independent random


sample of size n1 from a normally distributed population with variance σ12
and S22 is the sample variance of an independent random sample of size n2
from a normally distributed population with variance σ22 , then the statistic
S 2σ2
F = 12 22 has an F distribution with numerator degrees of freedom n1 − 1
S2 σ1
and denominator degrees of freedom n2 − 1

• Basically, the F distribution has two separate degrees of freedom parameters.


We will see shortly how to use the table to get a critical value
S12 σ12
• If the null hypothesis is true then the statistic F reduces to since (which
S22 σ22
σ22
means also 2 = 1)
σ1
S12
• Hence our test statistic in this case is F =
S22
• We reject H0 if Fobserved > fα,n1 −1,n2 −1 , as visualized in the following graph:

83
Hypothesis Test for Comparing Variances of Two Populations: Exam-
ple
• The manager of a dairy is in the process of deciding which of two new carton-
filling machines to use. The most important attribute is the consistency of the
fills (i.e. the fills should have a small variance). She takes a random sample of
ten cartons filled by machine 1 and a random sample of eleven cartons filled
by machine 2 and measures the fill volume of each carton. The results are
displayed below. Can we infer that the second machine is more consistent
than the first (i.e. it has a smaller variance)? Use a hypothesis test with 5%
significance level.
1. Hypotheses:
σ12
H0 : =1
σ22
σ2
HA : 12 > 1
σ2

Machine 1 0.998 0.997 1.003 1.000 0.999


1.000 0.998 1.003 1.004 1.000
Machine 2 1.003 1.004 0.997 0.996 0.999 1.003
1.000 1.005 1.002 1.004 0.996

84
2. Significance Level: α = 0.05
S12
3. Test Statistic: Under H0 , F = has an F distribution with n1 − 1
S22
numerator degrees of freedom and n2 − 1 denominator degrees of freedom
4. Rejection Rule: We will reject H0 if Fobserved > fα,n1 −1,n2 −1 = f0.05,9,10 =
3.020
◦ Note: there is a separate table for α = 0.05 and α = 0.01 because
we need the columns of the table to cover the numerator degrees of
freedom and the rows of the table to cover the denominator degrees
of freedom
◦ Hence we go to the table entitled ‘Critical Values of the F -Distribution:
α = 0.05’, look up n1 −1 in the columns (Numerator Degrees of Free-
dom) and n2 − 1 in the rows (Denominator Degrees of Freedom)
5. Calculation:
1
X̄ = (0.998 + 0.997 + · · · + 1.000) = 1.0002
10
1
S12 = [0.998 − 1.0002]2 + [0.997 − 1.0002]2 + · · · + [1.000 − 1.0002]2

10 − 1
= 5.7333 × 10−6
1
Ȳ = (1.003 + 1.004 + · · · + 0.996) = 1.000818
11
1
S22 = [1.003 − 1.000818]2 + [1.004 − 1.000818]2 + · · · + [0.996 − 1.000818]2

11 − 1
= 1.1364 × 10−5
S2
Fobserved = 12
S2
5.7333 × 10−6
= = 0.5045
1.1364 × 10−5

6. Decision: Fobserved = 0.5045 < 3.020 therefore we fail to reject H0 at the


0.05 significance level
7. There is insufficient evidence to conclude that the variance of filling vol-
ume for the first machine is greater than for the second machine.

Hypothesis Test for Comparing Variances of Two Populations: Exer-


cises
• A broth used to manufacture a pharmaceutical product has its sugar content,
in mg/mL, measured several times on three successive days.
◦ Can you conclude (at 1% significance level) that the variability of the
process is greater on the second day than on the first day?
◦ Can you conclude (at 1% significance level) that the variability of the
process is greater on the third day than on the second day?

85
Day 1: 5.0 4.8 5.1 5.1 4.8 5.1 4.8
4.8 5.0 5.2 4.9 4.9 5.0
Day 2: 5.8 4.7 4.7 4.9 5.1 4.9 5.4
5.3 5.3 4.8 5.7 5.1 5.7
Day 3: 6.3 4.7 5.1 5.9 5.1 5.9 4.7
6.0 5.3 4.9 5.7 5.3 5.6

Assumptions for Hypothesis Tests concerning Means and Variances

• It is important to remember that all of the confidence intervals and hypothesis


tests we have done so far, with the exception of those concerning proportions,
rest on the assumption that the population(s) are normally distributed

• All the hypothesis tests rely on the assumption that the data consists of inde-
pendent random samples

• If these assumptions are not met, our hypothesis test is not valid.

• In the computer lab we will learn some basic techniques for checking the va-
lidity of the ‘normality’ assumption, which can be done graphically or using
normality tests

• More sophisticated techniques for checking our test assumptions will wait until
second and third year subjects

7 Pearson Chi-Squared Tests for Categorical Data


7.1 Chi-Squared Goodness of Fit Test
Chi-Squared Goodness of Fit Test: Background

• This section describes two tests used to analyze categorical data, i.e. data
measured on the nominal scale of measurement

• These tests are named after the British statistician Karl Pearson

• Both tests use the χ2 distribution, which we already encountered for testing
the variance of one population

• We have already considered tests based on a binomial experiment, which has


n trials and two possible outcomes in each trial, each having a probability of
success p and a probability of failure q = 1 − p

• Now consider a multinomial experiment

• It is similar to a binomial experiment: we have n independent trials in which


an outcome is observed

86
• The difference is that instead of two possible outcomes for each trial (‘success’
and ‘failure’) we have k possible outcomes, which we can call outcome 1,
outcome 2, outcome 3, etc. up to outcome k

• In each trial, the probability of outcome 1 is π1 , the probability of outcome 2


is π2 , the probability of outcome 3 is π3 , etc., up to, the probability of outcome
k is πk

• Obviously π1 + π2 + π3 + · · · + πk = 1 (one of the outcomes must happen in


each trial)

• Instead of just one random variable Y , in this case we define k random variables
O1 , O2 , O3 , . . . , Ok where Oj is the observed number of occurrences of outcome
j

• Of course O1 + O2 + O3 + · · · + Ok = n (the number of occurrences of all the


outcomes together must add up to the number of trials)

• Here is a way to visualize the multinomial experiment: suppose we have k


boxes. We toss n balls at these boxes such that each ball must fall into exactly
one of the boxes. The probability that a ball lands in a box can can vary from
box to box but remains the same for a particular box for each toss. Oj will be
the number of balls that land in box j

• Notice that in the special case where k = 2, this reduces to a binomial exper-
iment (we could say if the ball lands in box 1 it is a ‘success’ and if the ball
lands in box 2 it is a ‘failure’)

Chi-Squared Goodness of Fit Test: Constructing the Test

• Suppose we have k boxes and n = 100 balls are tossed at them. Suppose
further that π1 = 0.1, i.e. for each toss there is a 10% chance that the ball
will land in box 1. How many balls would we expect to find in box 1 after 100
trials?
E (O1 ) = nπ1 = (100)(0.1) = 10

• In general, the number of balls expected to land in box j is E (Oj ) = nπj for
j = 1, 2, . . . , k

• Suppose we want to test the null hypothesis that π1 = π1? , π2 = π2? , . . . , πk = πk?
against the alternative that these are not all the correct probabilities

• This could be called a ‘goodness of fit’ test because we are basically testing
whether the data fit a particular probability distribution

• If the null hypothesis is true then E (Oj ) = nπj? for j = 1, 2, . . . , k

• Let us call these expected values under the null hypothesis Ej for j = 1, 2, . . . , k

• Intuitively, the observed number of balls in box j, Oj , should be close to its


null-hypothesis expected value, Ej

87
• If we throw 100 balls at k boxes and the probability of a ball landing in box 1 is
10%, the expected number of balls in box 1 is 10 and we will be very surprised
if we find 40 balls in box 1; we will doubt whether our 10% probability is
correct

• Hence, a statistic that could measure how well the observed data fits the null
Xk
hypothesis expected values is (Oj − Ej )2
j=1

• The further apart the observed counts Oj are from the expected counts Ej ,
k
X
the larger this statistic will be; hence if (Oj − Ej )2 is very large we will
j=1
reject H0

• But how large must the statistic be before we reject H0 ? We need a probability
distribution for the statistic.

• There is no nice probability distribution for this statistic but if we divide by


Ej there is a nice result:
k
2
X (Oj − Ej )2
χ =
j=1
Ej

• This statistic is called χ2 because it approximately follows a χ2 distribution


with k − 1 degrees of freedom

• Hence we will reject the null hypothesis (the null probability distribution) if
χ2observed > χ2α,k−1

• The key assumptions of the chi-squared goodness of fit test are as follows:

1. Expected frequencies (Ej ) should not be too small, otherwise the χ2 dis-
tribution is not a good approximation. A standard rule of thumb is to
avoid using this test if any Ej < 1 or if more than 20% of the categories
have Ej < 5
2. The ‘trials’ in the underlying multinomial experiment should be indepen-
dent (just as in a binomial experiment)

88
Chi-Squared Goodness of Fit Test: Example 1

• A group of rats, one by one, proceed down a ramp to one of three doors.
We wish to test the hypothesis that the rats have no preference as to which
1
door they choose, which would mean that π1 = π2 = π3 = where πj is the
3
probability that a rat will choose door j for j = 1, 2, 3

• Suppose we send 90 rats down the ramp and observe that 23 rats choose Door
1, 36 rats choose Door 2, and 31 rats choose Door 3. At the 5% significance
level, test the null hypothesis that the rats have no preference for which door
they choose

1. Hypotheses:
1
H0 : π1 = π2 = π3 =
3
HA : This is not the probability distribution

2. Significance Level: α = 0.05


k
2
X (Oj − Ej )2
3. Test Statistic: Under H0 , χ = has a χ2 distribution with
j=1
Ej
k − 1 degrees of freedom
4. Rejection Rule: We will reject H0 if χ2observed > χ2α,k−1 = χ0.05,2 = 5.991

89
(Oj − Ej )2
Door Observed Frequency Oj Expected Frequency Ej
Ej
90 31 = 30

1 23 1.6333
2 36 30 1.2
3 31 30 0.0333
Total 90 90 χ2observed = 2.867

5. Calculation: with a chi-squared goodness of fit test it is a good idea to set


up a table to make the calculations. We will use the formula Ej = nπj?
to calculate the Ej values, and in this case πj? = 13 for j = 1, 2, 3
6. Decision: χ2observed = 2.867 < 5.991 therefore we fail to reject H0 at the
0.05 significance level
7. There is insufficient evidence to conclude that the rats’ have different
preferences for the different doors. In other words, it is reasonable to
conclude that the rats have no preference but choose each door with
equal probability.

Chi-Squared Goodness of Fit Test: Example 2

• Laptop computers have become relatively much less expensive since they first
entered the market in the late 1980s. Has this changed the profile of laptop
customers? A 1988 survey found that 69% of laptop customers were busi-
nesses, 21% were government agencies, 7% were educational institutions, and
only 3% were for private use at home. A more recent survey of 150 buyers of
laptops from a particular vendor found that 76 were businesses, 25 were gov-
ernment agencies, 17 were educational institutions and 32 were for personal
home use. Do these data fit the 1988 distribution of laptop customers or has
the distribution changed? Use α = 0.1 level for this test.

1. Hypotheses:

H0 : π1 = 0.69, π2 = 0.21, π3 = 0.07, π4 = 0.03


HA : This is not the probability distribution

2. Significance Level: α = 0.1


k
2
X (Oj − Ej )2
3. Test Statistic: Under H0 , χ = has a χ2 distribution with
j=1
Ej
k − 1 degrees of freedom
4. Rejection Rule: We will reject H0 if χ2observed > χ2α,k−1 = χ20.1,3 = 6.251
5. Calculation: We will use the formula Ej = nπj? to calculate the Ej values,
and in this case π1? = 0.69, π2? = 0.21, π3? = 0.07, π4? = 0.03

90
(Oj − Ej )2
Customer Type Observed Frequency Oj Expected Frequency Ej
Ej
Business 76 150(0.69) = 103.5 7.307
Government 25 150(0.21) = 31.5 1.341
Education 17 150(0.06) = 9 7.111
Home 32 150(0.04) = 6 112.667
2
Total 150 150 χobserved = 128.43

6. Decision: χ2observed = 128.43 > 6.251 therefore we reject H0 at the 0.1


significance level
7. There is evidence that the distribution of laptop customers among these
four categories is no longer the same as it was in 1988.

7.2 Chi-Squared Test for Independence


Chi-Squared Test for Independence: Background

• A similar test developed by Pearson based on the χ2 distribution is the chi-


squared test for independence

• This method is used to test for an association between two categorical (nomi-
nal) variables

• The data for such a test are assembled into a ‘contingency table’ (sometimes
called a cross-tabulation table) in which the rows represent categories of one
variable and the columns represent categories of a second variable

• This is usually referred to as an r × c (r by c) contingency table where r is the


number of rows and c is the number of columns

• For example, consider the following 2 × 5 contingency table showing the rela-
tionship between shift and day of the week for absenteeism. Each value in the
table represents the number of absences in one month at a factory, according
to the shift (day or evening) and day of the week (Monday to Friday)

Day of Week
Shift Mon Tues Wed Thurs Fri Total
Day 52 28 37 31 33 181
Evening 35 34 34 37 41 181
Total 87 62 71 68 74 362

91
• The table gives us an idea of whether there is an association between shift
and day of the week for absenteeism. For instance, maybe the day shift has
more absenteeism on Mondays and the evening shift has more absenteeism on
Fridays

• We could also display the data graphically:

• This helps us to visualize the possible relationship between shift and day of the
week, but it doesn’t give us an objective way of deciding whether a relationship
actually exists (as opposed to just random variation)

• Hence we need a hypothesis test

Chi-Squared Test for Independence: Constructing the Test

• The null hypothesis in this case is that the row frequencies are independent of
the column frequencies

• The alternative hypothesis is that the row frequencies and the column frequen-
cies have an association

• (There is no easy way to express these hypothesis in mathematical notation,


so we can just write them in words)

• A shorter way to express the hypotheses is:

92
◦ H0 : The two variables are independent
◦ HA : The two variables are dependent

• In a particular example we should be more specific by indicating what the two


variables are, e.g. ‘Shift and day of week are independent when it comes to
employee absenteeism’

• Once again we are going to test the null hypothesis by comparing the Observed
frequencies with the Expected frequencies; but the method for determining the
Expected frequencies is now different

• Under H0 , i.e. assuming H0 is true, what will be the expected frequency of a


particular cell of the table?

• Let’s first define a few variables:

◦ Let n be the total number of observations, i.e. the total frequency of all
the cells in the table
◦ Let ri be the number of observations in the ith row of the table, i.e. the
sum of row i
◦ Let cj be the number of observations in the jth column of the table, i.e.
the sum of column j
ri
• The estimated probability that an observation falls in row i is equal to ; for
n
181
example, in the table above, Pr (Shift = Evening) = = 0.5
362
• Similarly, the estimated probability that an observation falls in column j is
cj 87
equal to ; for example, Pr (Weekday = Monday) = = 0.2403
n 362
• What is the probability that an observation falls in row i and column j? (Let
us call this πij ). For example, what is Pr (Shift = Evening ∩ Weekday = Monday)?

• If the row variable and the column variable are independent (as the null hy-
pothesis claims), then we can apply the multiplication rule for independent
r i cj
events: πij =
nn
• For example, if Shift and Weekday are independent, then

Pr (Shift = Evening ∩ Weekday = Monday)


= Pr (Shift = Evening) × Pr (Weekday = Monday)
= 0.5 × 0.2403 = 0.1202

• In fact, if the null hypothesis is true then the observed contingency table
frequencies Oij follow a multinomial distribution with k = rc categories and
n trials (in our example, 362 trials)

93
• The expected frequency in row i, column j under the null hypothesis is thus
ri cj ri cj
Eij = nπij = n =
nn n
• If the observed frequencies tend to be far from these expected frequencies,
it will be evidence that the row variable and the column variable are not
independent since the probabilities calculated using the multiplication rule for
independent events do not fit the data

• We can thus use the following test statistic:


r X c
X (Oij − Eij )2
χ2 =
i=1 j=1
Eij

• Under H0 , this statistic approximately follows a χ2 distribution with


(r − 1)(c − 1) degrees of freedom

• Thus we reject the null hypothesis if χ2observed > χ2α,(r−1)(c−1)

• Note that the assumptions of the chi-squared test for independence are ba-
sically the same as those of the chi-squared goodness of fit test: every cell in
the contingency table should have an expected frequency of at least 1 and at
least 80% of the cells should have an expected frequency of at least 5.

Chi-Squared Test for Independence: Example 1

• Let us apply the test to our absenteeism example at the 5% significance level

1. Hypotheses:

H0 : Shift and Weekday are independent when it comes to absenteeism


HA : Shift and Weekday are dependent when it comes to absenteeism

2. Significance Level: α = 0.05


r X c
X (Oij − Eij )2
3. Test Statistic: Under H0 , χ2 = has a χ2 distribution
i=1 j=1
E ij

with (r − 1)(c − 1) degrees of freedom


4. Rejection Rule: We will reject H0 if χ2observed > χ2α,(r−1)(c−1) = χ20.05,(2−1)(5−1) =
χ20.05,4 = 9.488
r i cj
5. Calculation: We will use the formula Eij = to calculate the expected
n
frequencies. These are calculated in brackets in the table below, next to
the observed frequencies

94
Day of Week
Shift Mon Tues Wed Thurs Fri Total
(181)(87) (181)(62) (181)(71) (181)(68) (181)(74)
Day 52 ( 362
= 43.5) 28 ( 362
= 31) 37 ( 362
= 35.5) 31 ( 362
= 34) 33 ( 362
= 37) 181
(181)(87) (181)(62) (181)(71) (181)(68) (181)(74)
Evening 35 ( 362
= 43.5) 34 ( 362
= 31) 34 ( 362
= 35.5) 37 ( 362
= 34) 41 ( 362
= 37) 181
Total 87 62 71 68 74 362

2 X 5
X (Oij − Eij )2
χ2observed =
i=1 j=1
Eij
(52 − 43.5)2 (28 − 31)2 (37 − 35.5)2 (31 − 34)2
= + + +
43.5 31 35.5 34
2 2 2
(33 − 37) (35 − 43.5) (34 − 31) (34 − 35.5)2
+ + + +
37 43.5 31 35.5
2 2
(37 − 34) (41 − 37)
+ +
34 37
= 5.424

6. Decision: χ2observed = 5.424 < 9.488 therefore we fail to reject H0 at the


0.05 significance level
7. There is not enough evidence to conclude that an association exists be-
tween shift and weekday in absenteeism.

Yates Continuity Correction for 2 × 2 Contingency Table, with Example


• In the case where a table has r = 2 rows and c = 2 columns, we need to
perform a continuity correction, whereby our test statistic changes to:
2 X 2
X (Oij − Eij − 0.5)2
χ2 =
i=1 j=1
Eij

• Consider the following example. The Department of Correctional Services


runs a skills training programme with prison inmates. They run a study to
determine whether there is an association between participation in the pro-
gramme and reintegration into society after release from prison. They consider
two categorical variables: whether a prisoner participated in the skills training
programme, and whether the prisoner re-offended within one year of his/her
release. The data are shown in the following contingency table. At the 0.01
significance level, test whether there is an association between participation in
the programme and re-offending

1. Hypotheses:
H0 : Participation in Programme and Behaviour after Release are independent
HA : Participation in Programme and Behaviour after Release are related

95
Does not
Re-offends Total
re-offend
Completes 3 57 60
programme
Does not complete
27 13 40
programme
Total 30 70 100

2. Significance Level: α = 0.01


r X c
X
2 (Oij − Eij − 0.5)2
3. Test Statistic: Under H0 , χ = has a χ2 distri-
i=1 j=1
E ij

bution with (r − 1)(c − 1) degrees of freedom


4. Rejection Rule: We will reject H0 if χ2observed > χ2α,(r−1)(c−1) = χ20.01,(2−1)(2−1) =
χ20.01,1 = 6.635
r i cj
5. Calculation: We will use the formula Eij = to calculate the expected
n
frequencies. These are calculated in brackets in the table below, next to
the observed frequencies

Behaviour after Release


Skills Programme Re-offended Did not re-offend Total
Participated 3 ( 100 = 18) 57 ( (60)(70)
(60)(30)
100
= 42) 60
(40)(30) (40)(70)
Did not participate 27 ( 100 = 12) 13 ( 100 = 28) 40
Total 30 70 100

2 X 5
X (Oij − Eij − 0.5)2
χ2observed =
i=1 j=1
Eij
(3 − 18 − 0.5)2 (57 − 42 − 0.5)2
= +
18 42
(27 − 12 − 0.5)2 (13 − 28 − 0.5)2
+ +
12 28
= 44.454

6. Decision: χ2observed = 44.454 > 6.635 therefore we reject H0 at the 0.01


significance level
7. There is evidence to conclude that an association exists between par-
ticipation in the skills training programme and behaviour after release
from prison. (Specifically, it appears that those who participate in the
programme are less likely to re-offend.)

96
What to do when expected cell frequencies are too small

• A study is conducted to determine whether an association exists between ed-


ucation level and employment status

• The following contingency table is produced (with expected counts already


calculated in brackets)

Education Level
Employment Status Primary Some Secondary Matric Tertiary Qualification Total
Employed 3 (6.5) 36 (37.7) 44 (40.95) 8 (5.85) 91
Unemployed 7 (3.5) 22 (20.3) 19 (22.05) 1 (3.15) 49
Total 10 58 63 9 140

• We can see that two cells (25%) have an expected count less than 5; this
violates the rule of thumb

• What we can do in this case is to merge one or two of the small categories into
an adjacent, larger category

• For instance, we might combine ‘Primary’ and ‘Some Secondary’ into a single
category, ‘Primary or Some Secondary’

• Or, we might combine ‘Matric’ and ‘Tertiary Qualification’ into a single cate-
gory, ‘Matric or Tertiary Qualification’

• Or we might do both; if we do both, the new table will look like this:

Education Level
Employment Status Primary or Some Secondary Matric or Tertiary Qualification Total
Employed 39 (44.2) 52 (46.8) 91
Unemployed 29 (23.8) 20 (25.2) 49
Total 68 72 140

• Now we have no problem with our model assumptions and we can continue
with the test

97
Value Observed
1 115
2 97
3 91
4 101
5 110
6 86
Total 600

Chi-Squared Tests for Goodness of Fit and Independence: Exercises

• A six-sided die is rolled 600 times and the frequency of each number from 1
to 6 is observed (shown in table). Test whether this die is fair (i.e. all six
numbers are equally likely to be rolled). Use 0.1 significance level.

• The Mendelian theory of genetics states that the number of a certain type of
peas falling into the classifications ‘round and yellow’, ‘wrinkled and yellow’,
‘round and green’ and ‘wrinkled and green’ should be in the ratio 9:3:3:1
(meaning that 9/16 of the peas should be in the first category, 3/16 in the
second, etc.) Suppose that 100 randomly sampled peas of this type revealed
56, 19, 17 and 8 in the respective categories. Do these data fit the model at
the 0.05 significance level?

• Specifications for the dimensions of a roller are 2.10-2.11 cm. Rollers that are
too thick can be reground, while those that are too thin must be scrapped.
Three machinists grind these rollers. Samples of rollers were collected from
each machine, and their diameters were measured. The results are as follows:
Can you conclude that the proportions of rollers in the three categories differ

Machinist Good Regrind Scrap Total


A 328 58 14 400
B 231 48 21 300
C 409 73 18 500
Total 968 179 53 1200

according to which machinist produced them? Use a hypothesis test at 1%


significance level.

• Complete the hypothesis test for the education vs. employment status example
above after the categories were combined, using α = 0.05. Don’t forget the
Yates continuity correction since this is now a 2 × 2 table!

98
Statistical Techniques for Categorical Data

• The following table can help you determine which method to use to solve a
given problem

Problem Objective Number of Categories Statistical Method


Describe a population 2 z test for p or chi-squared
goodness of fit test with k =
2
Describe a population More than 2 Chi-squared goodness of fit
test with k
Compare two populations 2 z test for p1 = p2 or
chi-squared test of indepen-
dence (2 × 2 table)
Compare two populations More than 2 Chi-squared test of indepen-
dence (r × c table)

99
8 Introduction to Nonparametric Methods
Introduction: Parametric vs. Nonparametric Statistical Methods

• A parametric statistical method makes specific assumptions with regard to


one or more of the population parameters that characterise the underlying
distribution(s) for which the method is used

◦ For example, in all of the experimental design ANOVA models we have


used so far, we have assumed that the random errors are normally dis-
tributed random variables with a mean of 0 and a fixed variance σ 2

• A nonparametric statistical method (sometimes called a distribution-free


method) does not make such assumptions about the parameters; it makes
fewer assumptions overall and is more flexible

• As a general rule, hypothesis tests which evaluate nominal or ordinal data are
nonparametric, while tests that evaluate interval or ratio data are parametric

• The advantage of parametric methods is that they are more powerful (when
their assumptions hold): they have a lower type II error than their correspond-
ing nonparametric method

• The disadvantage of parametric methods is that they may give incorrect con-
clusions if the model assumptions do not hold: the type I error and/or type
II error may be much higher than it is supposed to be; thus nonparametric
methods can be used in a wider set of circumstances than parametric methods

Overview of Nonparametric Methods to be covered

• In this brief chapter we will introduce just two nonparametric methods, though
there are many more

• (In fact, we have already learned two others, since the chi-squared tests covered
in the previous chapter are nonparametric or distribution-free)

• We will first look at a basic nonparametric test called the Sign Test which is
used to compare paired observations to see whether they tend to be equal

◦ It could be used as a nonparametric analogue to the one-sample t test

• We will then look at a nonparametric test called the Mann-Whitney Test


(also called Wilcoxon Test) which compares the probability distribution of
two independent samples

◦ It could be used as a nonparametric analogue to the two-sample t test

100
8.1 The Sign Test
Sign Test: Data

• The data consist of observations on a bivariate random sample where there


are n0 pairs of observations

(X1 , Y1 ), (X2 , Y2 ), . . . , (Xn0 , Yn0 )

• The observations within a pair need not be independent; in fact, they should
not be independent, because if they were, the Mann-Whitney Test would be
a more powerful method to use

• There should thus be some natural basis for pairing the observations, such as
weight loss of a single person using two different diet types, or weight of a
single person at two different points in time

• Within each pair a comparison is made, and the pair is classified as “+” (‘plus’)
if Xi < Yi , as “-” (‘minus’) if Xi > Yi , or as “0” (‘tie’) if Xi = Yi

Sign Test: Assumptions

• Nonparametric tests are sometimes called assumption-free methods but this is


incorrect: there are still assumptions, but they are not as strict as those of a
parametric test

• The Sign Test relies on the following assumptions:

1. The bivariate random variables (Xi , Yi ), i = 1, 2, . . . , n0 are mutually


independent; that is, for any i 6= j, (Xi , Yi ) is independent of (Xj , Yj )
2. The measurement scale of the data is at least ordinal within each pair
(i.e. it cannot be nominal); this allows us to classify each pair as ‘plus’,
‘minus’ or ‘tie’
3. The pairs (Xi , Yi ) are internally consistent, meaning that if Pr (+) >
Pr (−) for one pair (Xi , Yi ) then Pr (+) > Pr (−) for all pairs (the same
goes for Pr (+) < Pr (−) and Pr (+) = Pr (−)

Sign Test: Hypotheses

• The hypotheses (in the two-tailed case) are as follows:

H0 : Pr (+) = Pr (−)
H1 : Pr (+) 6= Pr (−)

• In words, the null hypothesis means that a ‘plus’ and a ‘minus’ are equally
likely to occur; the populations are equal in location

101
Sign Test: Test Statistic
• The test statistic is T which equals the number of ‘plus’ pairs; that is, T =
total number of +’s
• Under the null hypothesis, T follows a binomial distribution with p = 0.5 and
n =the number of non-tied pairs, i.e. n0 minus number of ties
Sign Test: Rejection Rule for Two-Tailed Test
• For n ≤ 20, we construct the critical region (rejection rule) using the binomial
distribution table
1
• We are only interested in the case where p = 2
(given in a table in the ap-
pendix)
• We look for a value of x in the table for which the probability is close to α/2
◦ We denote this value of x by t, and denote the probability value by α1
◦ We say that the significance level of our test is 2α1 (which will not usually
be exactly equal to α)
• We reject H0 if T ≤ t or T ≥ n − t
1 √ 
• If n ≥ 20 we can use a normal approximation: t = n − zα/2 n
2
• We would again reject H0 if T ≤ t or T ≥ n − t, and this time our significance
level is (approximately) equal to α
Sign Test: Rejection Rule for One-Tailed Test
• In the lower-tailed case, we have the hypotheses
H0 : Pr (+) ≥ Pr (−)
H1 : Pr (+) < Pr (−)
1
• For n ≤ 20, again we use the binomial distribution table with p = 2

• We look for a value of x in the table for which the probability is close to α
and call it t; the probability is called α1
• We reject H0 if T ≤ t (with significance level α1 )
1 √ 
• If n ≥ 20 we use the approximation t = n − zα n
2
• In the upper-tailed case, we have the hypotheses
H0 : Pr (+) ≤ Pr (−)
H1 : Pr (+) > Pr (−)

• We again find t in the same way as in the lower-tailed case, but we reject H0
if T ≥ n − t

102
Sign Test: Example 1
• Twenty-two customers in a grocery store were asked to taste each of two types
of cheese (cheddar and gouda) and declare their preference. Seven customers
preferred cheddar, twelve preferred gouda, and three had no preference. Does
this indicate a significant difference in preference?
• We define an observation to be “+” if a customer preferred cheddar, “-” if the
customer preferred gouda, and “0” if there was no preference
1. Hypotheses:
H0 : Pr (+) = Pr (−)
H1 : Pr (+) 6= Pr (−)

2. Significance level: α = 0.05 (we can use this exact level since n ≥ 20)
3. Test statistic: T = total number of +’s
1 √  1 √ 
4. Rejection Rule: t = n − zα/2 n = 19 − 1.96 19 = 5.23
2 2
◦ Hence n − t = 13.77
◦ Thus we reject H0 if T ≤ 5.23 or T ≥ 13.77
5. Calculation: Tobserved = 7 (the number of customers who preferred ched-
dar)
6. Decision: 5.23 < Tobserved < 13.77 thus we fail to reject H0 at 0.05 signif-
icance level
7. We conclude that we cannot claim a difference in preference between the
two types of cheese
Sign Test: Example 2
• Six athletes went on a diet in an attempt to lose weight, with the following
results:
Name Abdul John Senzo Frank Lerato Simon
Weight Before 74 91 188 82 101 88
Weight After 65 86 83 78 103 81

• Is this diet an effective means of losing weight?


• We define a pair of observations to be a “+” if the weight after was greater
than the weight before, a “-” if the weight after was less than the weight before,
and a “0” if the weight after is the same as the weight before
1. Hypotheses: we use a lower-tailed test because we are specifically inter-
ested in whether the diet is associated with weight loss and not merely
with weight change
H0 : Pr (+) ≥ Pr (−)
H1 : Pr (+) < Pr (−)

103
2. Target Significance Level: α = 0.05 (see below)
3. Test statistic: T = total number of +’s
4. Rejection Rule: We have n0 = 6 and n = 6 since there are no ties
◦ From our binomial table, with n = 6 and p = 0.5, the closest we
can get to α without going over is α1 = 0.0156; this becomes our
actual significance level
◦ Thus we reject H0 if T ≤ 0, with a significance level of 0.0156
5. Calculation; Tobserved = 1 (the number of people whose weight was greater
after than before)
6. Decision: Tobserved > 0 thus we fail to reject H0 at 0.0156 significance level
7. Conclusion: We cannot claim that a person’s weight while on the diet is
more likely to decrease than to increase; we have not proven that the diet
is effective

• Note that we reached this conclusion even though five out of six participants
did actually lose weight. Probably we should collect more data; the main
problem here is that we have low statistical power (high probability of a Type
II error) due to low sample size

8.2 The Mann-Whitney Test


Mann-Whitney Test: Data

• The data used for a Mann-Whitney Test (sometimes called a Wilcoxon Test)
consist of two random samples

• Let X1 , X2 , . . . , Xn1 denote the random sample of size n1 from population 1


and let Y1 , Y2 , . . . , Yn2 denote the random sample of size n2 from population 2

• Assign the overall ranks 1 to n1 + n2 to all the observations and let R(Xi ) and
R(Yj ) denote the rank assigned to Xi and Yj for all i and j

• For convenience, let N = n1 + n2

• If several sample values are exactly equal to each other (tied), assign to each
value the average of the ranks that would have been assigned to them had
there been no ties

Mann-Whitney Test: Assumptions

1. Both samples are random samples from their respective populations

2. There is independence within each sample as well as between the two samples
(the second part of this assumption differs from the Sign Test)

3. The measurement scale is at least ordinal (i.e. it is not nominal)

104
Mann-Whitney Test: Hypotheses

• The hypotheses (in the two-tailed case) are as follows

H0 : Pr (X < x) = Pr (Y < x) for all x


H1 : Pr (X < x) 6= Pr (Y < x) for some x

• As a special case the hypotheses can be stated in terms of means, as follows:

H0 : E (X) = E (Y )
H1 : E (X) 6= E (Y )

Mann-Whitney Test: Test Statistic

• If there are no ties, or just a few ties, the sum of the ranks in the first sample
can be used as a test statistic:
n1
X
T = R(Xi )
i=1

• If there are many ties, the following test statistic is preferable:


N +1
T − n1
T1 = v 2
u N
u n1 n2 X 2 n1 n2 (N + 1)2
t R −
N (N − 1) i=1 i 4(N − 1)

N
X
• Here, Ri2 refers to the sum of the squares of all N of the ranks or average
i=1
ranks actually used in the samples (after adjusting for ties)

Mann-Whitney Test: Rejection Rule for Two-Tailed Test

• Under the null hypothesis, T follows a probability distribution whose lower


quantiles are given in the appendix of notes under Mann-Whitney lower quan-
tiles

• The upper quantiles for the distribution of T are given by the relation w1−p =
n1 (n1 + n2 + 1) − wp where wp is the lower quantile from the table

• When n1 and n2 are both greater thanr20 and there are no ties, we can use the
n1 (N + 1) n1 n2 (N + 1)
approximation wp ≈ + zp where zp is the standard
2 12
normal quantile

• Thus we would reject H0 at the level of significance α if T < wα/2 or T > w1−α/2
(if using T ) or if |T1 | > zα/2 (if using T1 )

105
Mann-Whitney Test: Rejection Rule for One-Tailed Tests

• In the case of a lower-tailed test, the hypotheses may be stated as follows:

H0 : E (X) = E (Y )
H1 : E (X) < E (Y )

• In this case, if we are using T , we would reject H0 if T < wα , where wα is


taken from the Mann-Whitney table in appendix (if n1 and n2 are small) and
from the normal approximation (if n1 and n2 are ≥ 20)
• If there are many ties and we use T1 , we reject H0 if T1 < zα
• In the case of an upper-tailed test, the hypotheses may be stated as follows:

H0 : E (X) = E (Y )
H1 : E (X) > E (Y )

• Here, if we are using T then we reject H0 if T > w1−α where w1−α is taken
from the Mann-Whitney table in appendix (if n1 and n2 are small) and from
the normal approximation (if n1 and n2 are ≥ 20)
• If there are many ties and we use T1 , we reject H0 if T1 > z1−α

Mann-Whitney Test: Example 1

• The matric class in a particular high school had 48 boys. 12 boys lived on farms and
the other 36 lived in urban areas. A test was devised to see if farm boys in general
were more physically fit than city boys. Each boy in the class was given a physical
fitness test in which a low score indicates poor physical condition. The scores of the
farm boys (Xi ) and the city boys (Yj ) are as follows:

Xi : Farm Boys Yj : City Boys


14.8 10.6 12.7 16.9 7.6 2.4 6.2 9.9
7.3 12.5 14.2 7.9 11.3 6.4 6.1 10.6
5.6 12.9 12.6 16.0 8.3 9.1 15.3 14.8
6.3 16.1 2.1 10.6 6.7 6.7 10.6 5.0
9.0 11.4 17.7 5.6 3.6 18.6 1.8 2.6
4.2 2.7 11.8 5.6 1.0 3.2 5.9 4.0

• Although these two groups are not true random samples from the populations
of farm boys and city boys, it seems reasonable to assume that they would
resemble random samples from the populations of farm boys and city boys of
that age group. The independence assumption also seems reasonable.
• The hypotheses to be tested can be stated in words as follows:

H0 : Farm boys do not tend to be more fit, physically, than city boys
H1 : Farm boys tend to be more fit than city boys

106
• Mathematically, the null hypothesis could be stated as E (X) = E (Y ) and the
alternative hypothesis as E (X) > E (Y ) (so it is an upper tailed test)
• Note that we actually need to do the ranking before we begin our seven-step
hypothesis testing procedure because the test statistic depends on whether we
have ties
Mann-Whitney Test Example 1: Ranks
• We first rank the data as follows:
X Y Rank X Y Rank X Y Rank
1.0 1 6.2 17 11.3 33
1.8 2 6.3 18 11.4 34
2.1 3 6.4 19 11.8 35
2.4 4 6.7 20.5 12.5 36
2.6 5 6.7 20.5 12.6 37
2.7 6 7.3 22 12.7 38
3.2 7 7.6 23 12.9 39
3.6 8 7.9 24 14.2 40
4.0 9 8.3 25 14.8 41.5
4.2 10 9.0 26 14.8 41.5
5.0 11 9.1 27 15.3 43
5.6 13 9.9 28 16.0 44
5.6 13 10.6 30.5 16.1 45
5.6 13 10.6 30.5 16.9 46
5.9 15 10.6 30.5 17.7 47
6.1 16 10.6 30.5 18.6 48

• The bolded values in the table represent ties, where the ranks were calculated
by averaging. For instance, because the 12th, 13th and 14th values in order
12 + 13 + 14
were all equal, they each receive a rank of = 13
3
• Because we have a number of ties here, it is better to use T1 as our test statistic
Mann-Whitney Test Example 1: Hypothesis Test
1. Hypotheses:
H0 : E (X) = E (Y )
H1 : E (X) > E (Y )

2. Significance Level: α = 0.05


N +1
T − n1
3. Test statistic: Under H0 , T1 = v 2 approxi-
u N 2
u n1 n2 X n1 n2 (N + 1)
t Ri2 −
N (N − 1) i=1
4(N − 1)
mately follows a standard normal distribution

107
4. Rejection Rule: We will reject H0 if T1 observed > z0.05 = 1.645
N
X
5. Calculations: we first need to calculate Ri2 (including the tied ranks):
i=1

N
X
Ri2 = 12 + 22 + 32 + · · · + 482 = 38016
i=1

n
X
Next we need to calculate T = R(Xi ) = 6 + 10 + 13 + 18 + 22 + 26 +
i=1
30.5 + 34 + 36 + 39 + 41.5 + 45 = 321
N +1
T − n1
T1 = v 2
u N
u n1 n2 X 2 n1 n2 (N + 1)2
t R −
N (N − 1) i=1 i 4(N − 1)
48 + 1
321 − 12
=s 2
(12)(36) (12)(36)(48 + 1)2
38016 −
48(48 − 1) 4(48 − 1)
= 0.6431

6. Decision: T1observed = 0.6431 < 1.645 thus we fail to reject H0 at 0.05 signifi-
cance level

7. Conclusion: We conclude that there is insufficient evidence to support the


claim that farm boys are more physically fit than city boys

Mann-Whitney Test: Example 2

• A simple experiment was designed to see if flint in area A tended to have the
same degree of hardness as flint in area B. Four sample pieces of flint were
collected in area A and five pieces in area B. To determine which of two pieces
of flint was harder, the two pieces were rubbed against each other. The piece
sustaining less damage was judged the harder of the two. In this manner all
nine pieces of flint were ordered according to hardness. The rank of 1 was
assigned to the softest piece, rank 2 to the next softest, and so on

108
• The results are shown in the table below:
Origin of Piece Rank
A 1
A 2
A 3
B 4
A 5
B 6
B 7
B 8
B 9

Mann-Whitney Test Example 2: Hypothesis Test

1. Hypotheses:

H0 : E (X) = E (Y )
H1 : E (X) 6= E (Y )

In words, the null hypothesis states that the flints from areas A and B are
of equal hardness, whereas the alternative states that they are not of equal
hardness

2. Significance Level: α = 0.05

3. Test Statistic: Under H0 , T = sum of ranks of flints from area A follows the
Mann-Whitney distribution

4. Rejection Rule: from Mann-Whitney table with n1 = 4 and n2 = 5, our


lower quantile is w0.025 = 12. We calculate our upper quantile as w1−0.025 =
n1 (n1 + n2 + 1) − w0.025 = 4(4 + 5 + 1) − 12 = 28. Thus we reject H0 if
Tobserved < 12 or Tobserved > 28

5. Calculation: Tobserved = 1 + 2 + 3 + 5 = 11.

6. Decision: Tobserved < 12 thus we reject H0 at 0.05 significance level

7. Conclusion: We conclude that the flints from the two areas differ in degree of
hardness

109
9 Single-Factor Analysis of Variance
Comparing means of more than two populations

• We have learned how to use a two-sample test to compare the means of two
independent populations

• However, what if we have more than two populations to compare?

• For instance, we want to compare monthly sales at McDonalds restaurants


from three provinces: Western Cape, Eastern Cape and KwaZulu-Natal. We
would like to know if there is any statistically significant difference in means
between these three provinces. Suppose we collect data on monthly sales
during the previous month from random samples of McDonalds in each of the
three provinces

• One option would be to use several two-sample z or t tests. We could conduct


a two-sample test to compare mean sales in Western Cape and Eastern Cape,
another two-sample test to compare mean sales in Western Cape and KwaZulu-
Natal, and another two-sample test to compare mean sales in Eastern Cape
and KwaZulu-Natal

• This approach has some disadvantages. In general if we have a populations to


a

compare, we would need 2 two-sample tests; so if we were comparing monthly
McDonalds sales across all nine provinces we would need 36 two-sample tests!

• This is not only tedious, but also introduces a statistical problem: when we do
multiple hypothesis tests at the same time, the overall type I error increases
• For instance, using the additive probability rule, if we have two hypothesis
tests, we have

Pr (type I error in test 1 ∪ type I error in test 2)


= Pr (type I error in test 1) + Pr (type I error in test 2) −
Pr (type I error in test 1 ∩ type I error in test 2)

• If the two tests are independent and each have a type I error probability of
0.05, this becomes

Pr (type I error in test 1 ∪ type I error in test 2) = 0, 05+0, 05−(0, 05)(0, 05) = 0, 0975

• Thus if we have two independent hypothesis tests each with a type I error of
0,05 the probability of making a type I error in at least one of the two tests
is nearly 0,1. If we have three hypothesis tests (or 36) the cumulative type I
error will be even greater

• Hence we want to know if there is a type of hypothesis test that will allow us
to compare means of three or more populations all at once

• It turns out that there is, and it is called ANOVA (Analysis of Variance)

110
Analysis of Variance (ANOVA)

• Suppose you have a independent populations and have drawn a random sample
from each; the sample sizes are n1 , n2 , n3 , . . ., na

• Assume the (unknown) population means are µ1 , µ2 , µ3 , . . ., µa

• ANOVA is a method for testing the null hypothesis H0 : µ1 = µ2 = µ3 = · · · =


µa against the alternative HA : µi 6= µj for at least one pair i, j

• In words, the null hypothesis says that the means of all the a populations are
the same; the alternative hypothesis says that at least two means differ

• It seems strange that we would analyse the variance in order to test for dif-
ferences between means

• The logic of ANOVA comes from breaking the total variance of the data into
two pieces

• Let us consider a practical example

ANOVA Example

• A franchise restaurant has locations in three cities: Johannesburg, Cape Town


and Durban. The company’s Board of Directors wants to know whether mean
profits per location are equal across the three cities. Data for profits in the
last quarter (in millions of Rand) is available for six restaurant locations in
Johannesburg, four locations in Cape Town and five locations in Durban. The
data are displayed in the table below:

City Quarterly Profits Total


Johannesburg 10,8 11,4 13,5 11,1 13,0 14,4 74,2
Cape Town 11,0 9,5 10,5 7,7 38,7
Durban 10,3 7,5 5,2 10,6 10,1 43,7

• In this case we are comparing a = 3 populations: restaurant locations in


Johannesburg, Cape Town and Durban

• In the above table, the rows are i = 1, 2, 3 and the columns are j = 1, 2, . . . , ni
where n1 = 6, n2 = 4, n3 = 5

• We will the observation in the ith row and the jth column as yij

• We will denote the sum of observations in the ith row as yi• and the sum of
all observations, the grand total, as y••

111
• A basic way to compare the profits across the three cities would be to calculate
the sample mean for each city, ȳi• :
74, 2
ȳ1• = = 12, 3667
6
38, 7
ȳ2• = = 9, 675
4
43, 7
ȳ3• = = 8, 74
5

• We could use graphs such as a box-and-whisker plot or a means plot to compare


profits in the three cities:

• This graph may suggest that mean profits are highest in Johannesburg, fol-
lowed by Cape Town and then Durban, but as statisticians we must ask the
question, ‘Are the differences in means between the cities statistically signifi-
cant or do they merely reflect random variation in the data?’
• Our method for answering this question involves comparing variation in profits
between cities to variation in profits within each city. Variation in profits
within each city would be understood as random. Thus, if the variation in
profits between cities is similar to variation in profits within each city, we will
conclude that the apparent difference in profits between cities can be explained
as random variation rather than actual differences in mean profits. However,
if the variation in profits between cities is much larger than the variation in
profits within each city, this would imply that there are actual, non-random
differences in mean profits between cities.
• The overall variance in the data we can call
SSTotal
M STotal =
N −1

112
a
X
where N = ni is the total number of observations and
i=1

ni
a X 2
X y••
SSTotal = yij2 −
i=1 j=1
N

• Here, ‘MS’ stands for ‘Mean Sum of Squares’ while ‘SS’ stands for ‘Sum of
Squares’

• Notice that M STotal is basically the formula for a sample variance S 2 that you
learned in Statistics 1A.

• SSTotal can be broken into two pieces that we call SSTreatment and SSError .
SSTreatment measures the amount of variation between treatments or between
populations (in our example, between cities) while SSError measures the amount
of variation within treatments or within populations (in our example, within
each city)

• We have
SSTreatment
M STreatment = where
a−1
a 2
X yi• y2
SSTreatment = − ••
i=1
ni N
SSError
M SError = where
N −a
SSError = SSTotal − SSTreatment

• Note that, when the number of observations from each population (or treat-
ment) is unequal, as in our example, this is called an unbalanced ANOVA.
If the number of observations from each population is equal, this is called a
balanced ANOVA and we can replace ni in the above formulas with n, the
number of observations drawn from each population

• Now, under the null hypothesis µ1 = µ2 = · · · = µa , it can be proven


that the ratio of the between-populations variance M STreatment to the within-
populations variance M SError follows an F distribution with a − 1 and N − a
degrees of freedom.

• For this reason, we can use this ratio as a test statistic. We reject the null
M STreatment
hypothesis if the observed value of the test statistic F = is much
M SError
larger than the values that our F distribution with d.f. a − 1 and N − a is
likely to produce

• If we reject the null hypothesis we will conclude that the means are not equal
across all populations

113
• Let us implement our seven-step hypothesis testing procedure to perform an
ANOVA on the restaurant profits data at 5% significance level

1. Hypotheses:

H 0 : µ1 = µ2 = µ3
HA : At least one µi 6= µj

2. Significance Level: α = 0.05


M STreatment
3. Test Statistic: Under H0 , F = has an F distribution with
M SError
a − 1 = 2 and N − a = 15 − 3 = 12 degrees of freedom
4. Rejection Rule: We will reject H0 if Fobserved > fα,a−1,N −a = f0.05,2,12 =
3, 885
5. Calculation:

N = 6 + 4 + 5 = 15
y•• = 74, 2 + 38, 7 + 43, 7 = 156, 6
ni
3 X 2
X y••
SSTotal = yij2 −
i=1 j=1
N
 156, 62
= 10, 82 + 11, 42 + 13, 52 + · · · + 5, 22 + 10, 62 + 10, 12 −
15
= 1712, 96 − 1634, 904 = 78, 056
3
X y2 i•
2
y••
SSTreatment = −
i=1
ni N
74, 22 38, 72 43, 72 156, 62
 
= + + −
6 4 5 15
= 1673, 967 − 1634, 904 = 39, 06317
SSError = SSTotal − SSTreatment
= 78, 056 − 39, 06317 = 38, 99283
M STreatment SSTreatment /(a − 1)
Fobserved = =
M SError SSError /(N − a)
39, 06317/(2)
= = 6, 011
38, 99283/(12)

6. Decision: Fobserved = 6, 011 > 3, 885 therefore we reject H0 at the 0,05


significance level
7. We conclude that mean quarterly profits are not equal across all three
cities.

• Note: we cannot apply the p-value approach by hand for ANOVA because we
cannot calculate the p-values for the F distribution by hand. However, we can
use the p-value approach within SAS since SAS calculates the p-values for us.

114
ANOVA and Experimental Design

• One of the most common settings in which ANOVA is used is the design
of experiments. In our example above, we were analysing pre-existing data
taken from three different ‘populations’. In designed experiments, however,
we choose an ‘independent variable’ or ‘factor’, some categorical or discrete
variable whose effect on a continuous ‘dependent variable’ we want to deter-
mine. We choose two or more levels of this factor (treatments) and conduct an
experiment where we observe the value of the dependent variable repeatedly at
each value of the independent variable. We then use ANOVA to test whether
the value of the independent variable makes a difference to the outcome of the
experiment (the value of the dependent variable).

• Experimental design will be covered in much more detail in Statistics 2A.

• Consider the following example

• A sports medicine researcher collects data on the effectiveness of three meth-


ods of repairing a torn meniscus in the knees of athletes. The dependent
variable of interest is the stiffness of the joint measured in N/mm. Each of
the three surgical methods is applied to six randomly selected athletes with a
torn meniscus and the stiffness is measured after a healing interval. The data
are displayed below:

Method Stiffness (N/mm) Total


1 8,3 7,2 6,3 7,3 8,7 8,7
2 4,7 6,1 5,0 5,8 6,6 8,4
3 8,0 8,3 7,6 6,4 8,2 7,7

• Answer the following questions:

◦ Determine the values of y1• , y2• , y3• and y•• .


◦ Use single-factor ANOVA at 5% significance level to test whether stiffness
differs among the three methods of repairing a torn meniscus.
◦ Which of the three methods do you think is most effective? Explain.
(Hint: remember that stiffness is a bad thing!)

• Solution:

Method Stiffness (N/mm) Total


1 8,3 7,2 6,3 7,3 8,7 8,7 y1• = 46, 5
2 4,7 6,1 5,0 5,8 6,6 8,4 y2• = 36, 6
3 8,0 8,3 7,6 6,4 8,2 7,7 y3• = 46, 2
y•• = 129, 3

115
1. Hypotheses:

H 0 : µ1 = µ2 = µ3
HA : At least one µi 6= µj

2. Significance Level: α = 0.05


M STreatment
3. Test Statistic: Under H0 , F = has an F distribution with
M SError
a − 1 = 2 and N − a = 18 − 3 = 15 degrees of freedom
4. Rejection Rule: We will reject H0 if Fobserved > fα,a−1,N −a = f0.05,2,15 =
3, 682
5. Calculation:

N = 6 + 6 + 6 = 18 (this is a balanced design)


n
3 X 2
X y••
SSTotal = yij2 −
i=1 j=1
N
 129, 32
= 8, 32 + 7, 22 + 6, 32 + · · · + 6, 42 + 8, 22 + 7, 72 −
18
= 955, 29 − 928, 805 = 26, 485
3
X y2 i•
2
y••
SSTreatment = −
i=1
n N
46, 52 36, 62 46, 22 129, 32
 
= + + −
6 6 6 18
= 939, 375 − 928, 805 = 10, 570
SSError = SSTotal − SSTreatment
= 26, 485 − 10, 570 = 15, 915
M STreatment SSTreatment /(a − 1)
Fobserved = =
M SError SSError /(N − a)
10, 570/(2)
= = 4, 981
15, 915/(15)

6. Decision: Fobserved = 4, 981 > 3, 682 therefore we reject H0 at the 0,05


significance level
7. We conclude that mean stiffness is not equal across all three methods.

• We can see that the mean stiffness is lowest for method 2: ȳ2• = 36, 6/6 = 6, 1
whereas ȳ1• = 46, 5/6 = 7, 75 and ȳ3• = 46, 2/6 = 7, 7. Since we found
that there are statistically significant differences between the mean stiffness
outcomes of the three methods, we can conclude that method 2 is more effective
than the other two methods.

116
Single-factor ANOVA Exercises

• The following table reports stress at 600% elongation for a pieces of a certain
type of rubber tested at five different laboratories. At the 1% significance
level, test whether the mean stress differs among the five laboratories.

Laboratory Stress Total


1 53 53 65 57
2 70 62 76 62
3 45 41 39 36
4 60 56 55 55
5 49 53 53 55

• The following table gives the weight gains of three groups of rats that were
involved in a scientific experiment. The first group of rats were given the
hormone thyroxin, the second group were given the hormone thiouracil and the
third group, a ‘control group’, were given no hormones. Test at 5% significance
level for a difference in mean weight gain between the three groups. Comment
on the possible effects of thiouracil on rat weight gain.

Hormone Weight Gain (g) Total


Thyroxin 132 84 133 118 87 88 119
Thiouracil 68 68 63 52 80 80 63 61 89 69
Control 107 115 90 117 91 133 91 115 112 95

117
118
Integre Technical Publishing Co., Inc. Moore/McCabe November 16, 2007 1:29 p.m. moore page T-11

Tables T-11

Table entry for p and C is Probability p


the critical value t ∗ with
probability p lying to its
right and probability C lying
between −t ∗ and t ∗ . t*

TABLE D
t distribution critical values
Upper-tail probability p

df .25 .20 .15 .10 .05 .025 .02 .01 .005 .0025 .001 .0005

1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.061 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.235 2.583 2.921 3.252 3.686 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.224 2.567 2.898 3.222 3.646 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.214 2.552 2.878 3.197 3.611 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.197 2.528 2.845 3.153 3.552 3.850
21 0.686 0.859 1.063 1.323 1.721 2.080 2.189 2.518 2.831 3.135 3.527 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.183 2.508 2.819 3.119 3.505 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.177 2.500 2.807 3.104 3.485 3.768
24 0.685 0.857 1.059 1.318 1.711 2.064 2.172 2.492 2.797 3.091 3.467 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.167 2.485 2.787 3.078 3.450 3.725
26 0.684 0.856 1.058 1.315 1.706 2.056 2.162 2.479 2.779 3.067 3.435 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.158 2.473 2.771 3.057 3.421 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.154 2.467 2.763 3.047 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.150 2.462 2.756 3.038 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.147 2.457 2.750 3.030 3.385 3.646
40 0.681 0.851 1.050 1.303 1.684 2.021 2.123 2.423 2.704 2.971 3.307 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.109 2.403 2.678 2.937 3.261 3.496
60 0.679 0.848 1.045 1.296 1.671 2.000 2.099 2.390 2.660 2.915 3.232 3.460
80 0.678 0.846 1.043 1.292 1.664 1.990 2.088 2.374 2.639 2.887 3.195 3.416
100 0.677 0.845 1.042 1.290 1.660 1.984 2.081 2.364 2.626 2.871 3.174 3.390
1000 0.675 0.842 1.037 1.282 1.646 1.962 2.056 2.330 2.581 2.813 3.098 3.300
z∗ 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291

50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%

Confidence level C

119
Chi-Square Distribution Table

0 χ2

The shaded area is equal to α for χ2 = χ2α .

df χ2.995 χ2.990 χ2.975 χ2.950 χ2.900 χ2.100 χ2.050 χ2.025 χ2.010 χ2.005
1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169

120
Upper Tail Critical Values of F Distribution for a=0.1

Denominator Numerator Degrees of Freedom


D.F. 1 2 3 4 5 6 7 8 9 10
1 39.863 49.500 53.593 55.833 57.240 58.204 58.906 59.439 59.858 60.195
2 8.526 9.000 9.162 9.243 9.293 9.326 9.349 9.367 9.381 9.392
3 5.538 5.462 5.391 5.343 5.309 5.285 5.266 5.252 5.240 5.230
4 4.545 4.325 4.191 4.107 4.051 4.010 3.979 3.955 3.936 3.920
5 4.060 3.780 3.619 3.520 3.453 3.405 3.368 3.339 3.316 3.297
6 3.776 3.463 3.289 3.181 3.108 3.055 3.014 2.983 2.958 2.937
7 3.589 3.257 3.074 2.961 2.883 2.827 2.785 2.752 2.725 2.703
8 3.458 3.113 2.924 2.806 2.726 2.668 2.624 2.589 2.561 2.538
9 3.360 3.006 2.813 2.693 2.611 2.551 2.505 2.469 2.440 2.416
10 3.285 2.924 2.728 2.605 2.522 2.461 2.414 2.377 2.347 2.323
11 3.225 2.860 2.660 2.536 2.451 2.389 2.342 2.304 2.274 2.248
12 3.177 2.807 2.606 2.480 2.394 2.331 2.283 2.245 2.214 2.188
13 3.136 2.763 2.560 2.434 2.347 2.283 2.234 2.195 2.164 2.138
14 3.102 2.726 2.522 2.395 2.307 2.243 2.193 2.154 2.122 2.095
15 3.073 2.695 2.490 2.361 2.273 2.208 2.158 2.119 2.086 2.059
16 3.048 2.668 2.462 2.333 2.244 2.178 2.128 2.088 2.055 2.028
17 3.026 2.645 2.437 2.308 2.218 2.152 2.102 2.061 2.028 2.001
18 3.007 2.624 2.416 2.286 2.196 2.130 2.079 2.038 2.005 1.977
19 2.990 2.606 2.397 2.266 2.176 2.109 2.058 2.017 1.984 1.956
20 2.975 2.589 2.380 2.249 2.158 2.091 2.040 1.999 1.965 1.937
21 2.961 2.575 2.365 2.233 2.142 2.075 2.023 1.982 1.948 1.920
22 2.949 2.561 2.351 2.219 2.128 2.060 2.008 1.967 1.933 1.904
23 2.937 2.549 2.339 2.207 2.115 2.047 1.995 1.953 1.919 1.890
24 2.927 2.538 2.327 2.195 2.103 2.035 1.983 1.941 1.906 1.877
25 2.918 2.528 2.317 2.184 2.092 2.024 1.971 1.929 1.895 1.866
26 2.909 2.519 2.307 2.174 2.082 2.014 1.961 1.919 1.884 1.855
27 2.901 2.511 2.299 2.165 2.073 2.005 1.952 1.909 1.874 1.845
28 2.894 2.503 2.291 2.157 2.064 1.996 1.943 1.900 1.865 1.836
29 2.887 2.495 2.283 2.149 2.057 1.988 1.935 1.892 1.857 1.827
30 2.881 2.489 2.276 2.142 2.049 1.980 1.927 1.884 1.849 1.819
31 2.875 2.482 2.270 2.136 2.042 1.973 1.920 1.877 1.842 1.812
32 2.869 2.477 2.263 2.129 2.036 1.967 1.913 1.870 1.835 1.805
33 2.864 2.471 2.258 2.123 2.030 1.961 1.907 1.864 1.828 1.799
34 2.859 2.466 2.252 2.118 2.024 1.955 1.901 1.858 1.822 1.793
35 2.855 2.461 2.247 2.113 2.019 1.950 1.896 1.852 1.817 1.787
36 2.850 2.456 2.243 2.108 2.014 1.945 1.891 1.847 1.811 1.781
37 2.846 2.452 2.238 2.103 2.009 1.940 1.886 1.842 1.806 1.776
38 2.842 2.448 2.234 2.099 2.005 1.935 1.881 1.838 1.802 1.772
39 2.839 2.444 2.230 2.095 2.001 1.931 1.877 1.833 1.797 1.767
40 2.835 2.440 2.226 2.091 1.997 1.927 1.873 1.829 1.793 1.763
41 2.832 2.437 2.222 2.087 1.993 1.923 1.869 1.825 1.789 1.759
42 2.829 2.434 2.219 2.084 1.989 1.919 1.865 1.821 1.785 1.755
43 2.826 2.430 2.216 2.080 1.986 1.916 1.861 1.817 1.781 1.751
44 2.823 2.427 2.213 2.077 1.983 1.913 1.858 1.814 1.778 1.747
45 2.820 2.425 2.210 2.074 1.980 1.909 1.855 1.811 1.774 1.744
46 2.818 2.422 2.207 2.071 1.977 1.906 1.852 1.808 1.771 1.741
47 2.815 2.419 2.204 2.068 1.974 1.903 1.849 1.805 1.768 1.738
48 2.813 2.417 2.202 2.066 1.971 1.901 1.846 1.802 1.765 1.735
49 2.811 2.414 2.199 2.063 1.968 1.898 1.843 1.799 1.763 1.732
50 2.809 2.412 2.197 2.061 1.966 1.895 1.840 1.796 1.760 1.729
60 2.791 2.393 2.177 2.041 1.946 1.875 1.819 1.775 1.738 1.707
70 2.779 2.380 2.164 2.027 1.931 1.860 1.804 1.760 1.723 1.691
80 2.769 2.370 2.154 2.016 1.921 1.849 1.793 1.748 1.711 1.680
90 2.762 2.363 2.146 2.008 1.912 1.841 1.785 1.739 1.702 1.670
100 2.756 2.356 2.139 2.002 1.906 1.834 1.778 1.732 1.695 1.663
120 2.748 2.347 2.130 1.992 1.896 1.824 1.767 1.722 1.684 1.652
140 2.742 2.341 2.123 1.985 1.889 1.817 1.760 1.714 1.677 1.645
160 2.737 2.336 2.118 1.980 1.884 1.811 1.755 1.709 1.671 1.639
180 2.734 2.332 2.114 1.976 1.880 1.807 1.750 1.705 1.667 1.634
200 2.731 2.329 2.111 1.973 1.876 1.804 1.747 1.701 1.663 1.631
Infinity 2.706 2.303 2.084 1.945 1.847 1.774 1.717 1.670 1.632 1.599

121
Upper Tail Critical Values of F Distribution for a=0.05

Denominator Numerator Degrees of Freedom


D.F. 1 2 3 4 5 6 7 8 9 10
1 161.448 199.500 215.707 224.583 230.162 233.986 236.768 238.883 240.543 241.882
2 18.513 19.000 19.164 19.247 19.296 19.330 19.353 19.371 19.385 19.396
3 10.128 9.552 9.277 9.117 9.013 8.941 8.887 8.845 8.812 8.786
4 7.709 6.944 6.591 6.388 6.256 6.163 6.094 6.041 5.999 5.964
5 6.608 5.786 5.409 5.192 5.050 4.950 4.876 4.818 4.772 4.735
6 5.987 5.143 4.757 4.534 4.387 4.284 4.207 4.147 4.099 4.060
7 5.591 4.737 4.347 4.120 3.972 3.866 3.787 3.726 3.677 3.637
8 5.318 4.459 4.066 3.838 3.687 3.581 3.500 3.438 3.388 3.347
9 5.117 4.256 3.863 3.633 3.482 3.374 3.293 3.230 3.179 3.137
10 4.965 4.103 3.708 3.478 3.326 3.217 3.135 3.072 3.020 2.978
11 4.844 3.982 3.587 3.357 3.204 3.095 3.012 2.948 2.896 2.854
12 4.747 3.885 3.490 3.259 3.106 2.996 2.913 2.849 2.796 2.753
13 4.667 3.806 3.411 3.179 3.025 2.915 2.832 2.767 2.714 2.671
14 4.600 3.739 3.344 3.112 2.958 2.848 2.764 2.699 2.646 2.602
15 4.543 3.682 3.287 3.056 2.901 2.790 2.707 2.641 2.588 2.544
16 4.494 3.634 3.239 3.007 2.852 2.741 2.657 2.591 2.538 2.494
17 4.451 3.592 3.197 2.965 2.810 2.699 2.614 2.548 2.494 2.450
18 4.414 3.555 3.160 2.928 2.773 2.661 2.577 2.510 2.456 2.412
19 4.381 3.522 3.127 2.895 2.740 2.628 2.544 2.477 2.423 2.378
20 4.351 3.493 3.098 2.866 2.711 2.599 2.514 2.447 2.393 2.348
21 4.325 3.467 3.072 2.840 2.685 2.573 2.488 2.420 2.366 2.321
22 4.301 3.443 3.049 2.817 2.661 2.549 2.464 2.397 2.342 2.297
23 4.279 3.422 3.028 2.796 2.640 2.528 2.442 2.375 2.320 2.275
24 4.260 3.403 3.009 2.776 2.621 2.508 2.423 2.355 2.300 2.255
25 4.242 3.385 2.991 2.759 2.603 2.490 2.405 2.337 2.282 2.236
26 4.225 3.369 2.975 2.743 2.587 2.474 2.388 2.321 2.265 2.220
27 4.210 3.354 2.960 2.728 2.572 2.459 2.373 2.305 2.250 2.204
28 4.196 3.340 2.947 2.714 2.558 2.445 2.359 2.291 2.236 2.190
29 4.183 3.328 2.934 2.701 2.545 2.432 2.346 2.278 2.223 2.177
30 4.171 3.316 2.922 2.690 2.534 2.421 2.334 2.266 2.211 2.165
31 4.160 3.305 2.911 2.679 2.523 2.409 2.323 2.255 2.199 2.153
32 4.149 3.295 2.901 2.668 2.512 2.399 2.313 2.244 2.189 2.142
33 4.139 3.285 2.892 2.659 2.503 2.389 2.303 2.235 2.179 2.133
34 4.130 3.276 2.883 2.650 2.494 2.380 2.294 2.225 2.170 2.123
35 4.121 3.267 2.874 2.641 2.485 2.372 2.285 2.217 2.161 2.114
36 4.113 3.259 2.866 2.634 2.477 2.364 2.277 2.209 2.153 2.106
37 4.105 3.252 2.859 2.626 2.470 2.356 2.270 2.201 2.145 2.098
38 4.098 3.245 2.852 2.619 2.463 2.349 2.262 2.194 2.138 2.091
39 4.091 3.238 2.845 2.612 2.456 2.342 2.255 2.187 2.131 2.084
40 4.085 3.232 2.839 2.606 2.449 2.336 2.249 2.180 2.124 2.077
41 4.079 3.226 2.833 2.600 2.443 2.330 2.243 2.174 2.118 2.071
42 4.073 3.220 2.827 2.594 2.438 2.324 2.237 2.168 2.112 2.065
43 4.067 3.214 2.822 2.589 2.432 2.318 2.232 2.163 2.106 2.059
44 4.062 3.209 2.816 2.584 2.427 2.313 2.226 2.157 2.101 2.054
45 4.057 3.204 2.812 2.579 2.422 2.308 2.221 2.152 2.096 2.049
46 4.052 3.200 2.807 2.574 2.417 2.304 2.216 2.147 2.091 2.044
47 4.047 3.195 2.802 2.570 2.413 2.299 2.212 2.143 2.086 2.039
48 4.043 3.191 2.798 2.565 2.409 2.295 2.207 2.138 2.082 2.035
49 4.038 3.187 2.794 2.561 2.404 2.290 2.203 2.134 2.077 2.030
50 4.034 3.183 2.790 2.557 2.400 2.286 2.199 2.130 2.073 2.026
60 4.001 3.150 2.758 2.525 2.368 2.254 2.167 2.097 2.040 1.993
70 3.978 3.128 2.736 2.503 2.346 2.231 2.143 2.074 2.017 1.969
80 3.960 3.111 2.719 2.486 2.329 2.214 2.126 2.056 1.999 1.951
90 3.947 3.098 2.706 2.473 2.316 2.201 2.113 2.043 1.986 1.938
100 3.936 3.087 2.696 2.463 2.305 2.191 2.103 2.032 1.975 1.927
120 3.920 3.072 2.680 2.447 2.290 2.175 2.087 2.016 1.959 1.910
140 3.909 3.061 2.669 2.436 2.279 2.164 2.076 2.005 1.947 1.899
160 3.900 3.053 2.661 2.428 2.271 2.156 2.067 1.997 1.939 1.890
180 3.894 3.046 2.655 2.422 2.264 2.149 2.061 1.990 1.932 1.884
200 3.888 3.041 2.650 2.417 2.259 2.144 2.056 1.985 1.927 1.878
Infinity 3.841 2.996 2.605 2.372 2.214 2.099 2.010 1.938 1.880 1.831

122
Upper Tail Critical Values of F Distribution for a=0.01

Denominator Numerator Degrees of Freedom


D.F. 1 2 3 4 5 6 7 8 9 10
1 4052.181 4999.500 5403.352 5624.583 5763.650 5858.986 5928.356 5981.070 6022.473 6055.847
2 98.503 99.000 99.166 99.249 99.299 99.333 99.356 99.374 99.388 99.399
3 34.116 30.817 29.457 28.710 28.237 27.911 27.672 27.489 27.345 27.229
4 21.198 18.000 16.694 15.977 15.522 15.207 14.976 14.799 14.659 14.546
5 16.258 13.274 12.060 11.392 10.967 10.672 10.456 10.289 10.158 10.051
6 13.745 10.925 9.780 9.148 8.746 8.466 8.260 8.102 7.976 7.874
7 12.246 9.547 8.451 7.847 7.460 7.191 6.993 6.840 6.719 6.620
8 11.259 8.649 7.591 7.006 6.632 6.371 6.178 6.029 5.911 5.814
9 10.561 8.022 6.992 6.422 6.057 5.802 5.613 5.467 5.351 5.257
10 10.044 7.559 6.552 5.994 5.636 5.386 5.200 5.057 4.942 4.849
11 9.646 7.206 6.217 5.668 5.316 5.069 4.886 4.744 4.632 4.539
12 9.330 6.927 5.953 5.412 5.064 4.821 4.640 4.499 4.388 4.296
13 9.074 6.701 5.739 5.205 4.862 4.620 4.441 4.302 4.191 4.100
14 8.862 6.515 5.564 5.035 4.695 4.456 4.278 4.140 4.030 3.939
15 8.683 6.359 5.417 4.893 4.556 4.318 4.142 4.004 3.895 3.805
16 8.531 6.226 5.292 4.773 4.437 4.202 4.026 3.890 3.780 3.691
17 8.400 6.112 5.185 4.669 4.336 4.102 3.927 3.791 3.682 3.593
18 8.285 6.013 5.092 4.579 4.248 4.015 3.841 3.705 3.597 3.508
19 8.185 5.926 5.010 4.500 4.171 3.939 3.765 3.631 3.523 3.434
20 8.096 5.849 4.938 4.431 4.103 3.871 3.699 3.564 3.457 3.368
21 8.017 5.780 4.874 4.369 4.042 3.812 3.640 3.506 3.398 3.310
22 7.945 5.719 4.817 4.313 3.988 3.758 3.587 3.453 3.346 3.258
23 7.881 5.664 4.765 4.264 3.939 3.710 3.539 3.406 3.299 3.211
24 7.823 5.614 4.718 4.218 3.895 3.667 3.496 3.363 3.256 3.168
25 7.770 5.568 4.675 4.177 3.855 3.627 3.457 3.324 3.217 3.129
26 7.721 5.526 4.637 4.140 3.818 3.591 3.421 3.288 3.182 3.094
27 7.677 5.488 4.601 4.106 3.785 3.558 3.388 3.256 3.149 3.062
28 7.636 5.453 4.568 4.074 3.754 3.528 3.358 3.226 3.120 3.032
29 7.598 5.420 4.538 4.045 3.725 3.499 3.330 3.198 3.092 3.005
30 7.562 5.390 4.510 4.018 3.699 3.473 3.304 3.173 3.067 2.979
31 7.530 5.362 4.484 3.993 3.675 3.449 3.281 3.149 3.043 2.955
32 7.499 5.336 4.459 3.969 3.652 3.427 3.258 3.127 3.021 2.934
33 7.471 5.312 4.437 3.948 3.630 3.406 3.238 3.106 3.000 2.913
34 7.444 5.289 4.416 3.927 3.611 3.386 3.218 3.087 2.981 2.894
35 7.419 5.268 4.396 3.908 3.592 3.368 3.200 3.069 2.963 2.876
36 7.396 5.248 4.377 3.890 3.574 3.351 3.183 3.052 2.946 2.859
37 7.373 5.229 4.360 3.873 3.558 3.334 3.167 3.036 2.930 2.843
38 7.353 5.211 4.343 3.858 3.542 3.319 3.152 3.021 2.915 2.828
39 7.333 5.194 4.327 3.843 3.528 3.305 3.137 3.006 2.901 2.814
40 7.314 5.179 4.313 3.828 3.514 3.291 3.124 2.993 2.888 2.801
41 7.296 5.163 4.299 3.815 3.501 3.278 3.111 2.980 2.875 2.788
42 7.280 5.149 4.285 3.802 3.488 3.266 3.099 2.968 2.863 2.776
43 7.264 5.136 4.273 3.790 3.476 3.254 3.087 2.957 2.851 2.764
44 7.248 5.123 4.261 3.778 3.465 3.243 3.076 2.946 2.840 2.754
45 7.234 5.110 4.249 3.767 3.454 3.232 3.066 2.935 2.830 2.743
46 7.220 5.099 4.238 3.757 3.444 3.222 3.056 2.925 2.820 2.733
47 7.207 5.087 4.228 3.747 3.434 3.213 3.046 2.916 2.811 2.724
48 7.194 5.077 4.218 3.737 3.425 3.204 3.037 2.907 2.802 2.715
49 7.182 5.066 4.208 3.728 3.416 3.195 3.028 2.898 2.793 2.706
50 7.171 5.057 4.199 3.720 3.408 3.186 3.020 2.890 2.785 2.698
60 7.077 4.977 4.126 3.649 3.339 3.119 2.953 2.823 2.718 2.632
70 7.011 4.922 4.074 3.600 3.291 3.071 2.906 2.777 2.672 2.585
80 6.963 4.881 4.036 3.563 3.255 3.036 2.871 2.742 2.637 2.551
90 6.925 4.849 4.007 3.535 3.228 3.009 2.845 2.715 2.611 2.524
100 6.895 4.824 3.984 3.513 3.206 2.988 2.823 2.694 2.590 2.503
120 6.851 4.787 3.949 3.480 3.174 2.956 2.792 2.663 2.559 2.472
140 6.819 4.760 3.925 3.456 3.151 2.933 2.769 2.641 2.536 2.450
160 6.796 4.740 3.906 3.439 3.134 2.917 2.753 2.624 2.520 2.434
180 6.778 4.725 3.892 3.425 3.120 2.904 2.740 2.611 2.507 2.421
200 6.763 4.713 3.881 3.414 3.110 2.893 2.730 2.601 2.497 2.411
Infinity 6.635 4.605 3.782 3.319 3.017 2.802 2.639 2.511 2.407 2.321

123
Binomial Cumulative Distribution Function for p=1/2

nk
 n  1   1 
x k

     
k  0  k  2   2 
n
x 1 2 3 4 5 6 7 8 9 10
0 0.500 0.250 0.125 0.063 0.031 0.016 0.008 0.004 0.002 0.001
1 1.000 0.750 0.500 0.313 0.188 0.109 0.063 0.035 0.020 0.011
2 1.000 0.875 0.688 0.500 0.344 0.227 0.145 0.090 0.055
3 1.000 0.938 0.813 0.656 0.500 0.363 0.254 0.172
4 1.000 0.969 0.891 0.773 0.637 0.500 0.377
5 1.000 0.984 0.938 0.855 0.746 0.623
6 1.000 0.992 0.965 0.910 0.828
7 1.000 0.996 0.980 0.945
8 1.000 0.998 0.989
9 1.000 0.999
10 1.000

n
x 11 12 13 14 15 16 17 18 19 20
0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1 0.006 0.003 0.002 0.001 0.000 0.000 0.000 0.000 0.000 0.000
2 0.033 0.019 0.011 0.006 0.004 0.002 0.001 0.001 0.000 0.000
3 0.113 0.073 0.046 0.029 0.018 0.011 0.006 0.004 0.002 0.001
4 0.274 0.194 0.133 0.090 0.059 0.038 0.025 0.015 0.010 0.006
5 0.500 0.387 0.291 0.212 0.151 0.105 0.072 0.048 0.032 0.021
6 0.726 0.613 0.500 0.395 0.304 0.227 0.166 0.119 0.084 0.058
7 0.887 0.806 0.709 0.605 0.500 0.402 0.315 0.240 0.180 0.132
8 0.967 0.927 0.867 0.788 0.696 0.598 0.500 0.407 0.324 0.252
9 0.994 0.981 0.954 0.910 0.849 0.773 0.685 0.593 0.500 0.412
10 1.000 0.997 0.989 0.971 0.941 0.895 0.834 0.760 0.676 0.588
11 1.000 1.000 0.998 0.994 0.982 0.962 0.928 0.881 0.820 0.748
12 1.000 1.000 0.999 0.996 0.989 0.975 0.952 0.916 0.868
13 1.000 1.000 1.000 0.998 0.994 0.985 0.968 0.942
14 1.000 1.000 1.000 0.999 0.996 0.990 0.979
15 1.000 1.000 1.000 0.999 0.998 0.994
16 1.000 1.000 1.000 1.000 0.999
17 1.000 1.000 1.000 1.000
18 1.000 1.000 1.000
19 1.000 1.000
20 1.000

124
Mann-Whitney lower quantiles wp
To get w1-p, use formula w1-p=n1(n1+n2+1)-wp

n1 prob n2=2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 0.001 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
2 0.005 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4
2 0.01 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 5 5
2 0.025 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6
2 0.05 3 3 3 4 4 4 5 5 5 5 6 6 6 7 7 7 8 8 8
2 0.1 3 3 4 5 5 5 6 6 7 7 8 8 8 9 9 10 10 11 11
3 0.001 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7
3 0.005 6 6 6 6 6 6 6 7 7 7 8 8 8 9 9 9 9 10 10
3 0.01 6 6 6 6 6 7 7 8 8 8 9 9 9 10 10 11 11 11 12
3 0.025 6 6 6 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15
3 0.05 6 6 7 8 9 9 10 10 11 12 12 13 14 14 15 16 16 17 18
3 0.1 6 7 8 9 10 11 12 12 13 14 15 16 17 17 18 19 20 21 22
4 0.001 10 10 10 10 10 10 10 10 11 11 11 12 12 12 13 13 14 14 14
4 0.005 10 10 10 10 11 11 12 12 13 13 14 14 15 16 16 17 17 18 19
4 0.01 10 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19 20 20 21
4 0.025 10 10 11 12 13 14 15 15 16 17 18 19 20 21 22 22 23 24 25
4 0.05 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29
4 0.1 11 12 13 15 16 17 18 20 21 22 23 24 26 27 28 29 31 32 33
5 0.001 15 15 15 15 15 15 16 17 17 18 18 19 19 20 21 21 22 23 23
5 0.005 15 15 15 16 17 17 18 19 20 21 22 23 23 24 25 26 27 28 29
5 0.01 15 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
5 0.025 15 16 17 18 19 21 22 23 24 25 27 28 29 30 31 33 34 35 36
5 0.05 16 17 18 20 21 22 24 25 27 28 29 31 32 34 35 36 38 39 41
5 0.1 17 18 20 21 23 24 26 28 29 31 33 34 36 38 39 41 43 44 46
6 0.001 21 21 21 21 21 22 23 24 25 26 26 27 28 29 30 31 32 33 34
6 0.005 21 21 22 23 24 25 26 27 28 29 31 32 33 34 35 37 38 39 40
6 0.01 21 21 23 24 25 26 28 29 30 31 33 34 35 37 38 40 41 42 44
6 0.025 21 23 24 25 27 28 30 32 33 35 36 38 39 41 43 44 46 47 49
6 0.05 22 24 25 27 29 30 32 34 36 38 39 41 43 45 47 48 50 52 54
6 0.1 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 56 58 60
7 0.001 28 28 28 28 29 30 31 32 34 35 36 37 38 39 40 42 43 44 45
7 0.005 28 28 29 30 32 33 35 36 38 39 41 42 44 45 47 48 50 51 53
7 0.01 28 29 30 32 33 35 36 38 40 41 43 45 46 48 50 52 53 55 57
7 0.025 28 30 32 34 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63
7 0.05 29 31 33 35 37 40 42 44 46 48 50 53 55 57 59 62 64 66 68
7 0.1 30 33 35 37 40 42 45 47 50 52 55 57 60 62 65 67 70 72 75
8 0.001 36 36 36 37 38 39 41 42 43 45 46 48 49 51 52 54 55 57 58
8 0.005 36 36 38 39 41 43 44 46 48 50 52 54 55 57 59 61 63 65 67
8 0.01 36 37 39 41 43 44 46 48 50 52 54 57 59 61 63 65 67 69 71
8 0.025 37 39 41 43 45 47 50 52 54 56 59 61 63 66 68 71 73 75 78
8 0.05 38 40 42 45 47 50 52 55 57 60 63 65 68 70 73 76 78 81 84
8 0.1 39 42 44 47 50 53 56 59 61 64 67 70 73 76 79 82 85 88 91
9 0.001 45 45 45 47 48 49 51 53 54 56 58 60 61 63 65 67 69 71 72
9 0.005 45 46 47 49 51 53 55 57 59 62 64 66 68 70 73 75 77 79 82
9 0.01 45 47 49 51 53 55 57 60 62 64 67 69 72 74 77 79 82 84 86
9 0.025 46 48 50 53 56 58 61 63 66 69 72 74 77 80 83 85 88 91 94
9 0.05 47 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
9 0.1 48 51 55 58 61 64 68 71 74 77 81 84 87 91 94 98 101 104 108
10 0.001 55 55 56 57 59 61 62 64 66 68 70 73 75 77 79 81 83 85 88
10 0.005 55 56 58 60 62 65 67 69 72 74 77 80 82 85 87 90 93 95 98
10 0.01 55 57 59 62 64 67 69 72 75 78 80 83 86 89 92 94 97 100 103
10 0.025 56 59 61 64 67 70 73 76 79 82 85 89 92 95 98 101 104 108 111
10 0.05 57 60 63 67 70 73 76 80 83 87 90 93 97 100 104 107 111 114 118
10 0.1 59 62 66 69 73 77 80 84 88 92 95 99 103 107 110 114 118 122 126
11 0.001 66 66 67 69 71 73 75 77 79 82 84 87 89 91 94 96 99 101 104
11 0.005 66 67 69 72 74 77 80 83 85 88 91 94 97 100 103 106 109 112 115
11 0.01 66 68 71 74 76 79 82 85 89 92 95 98 101 104 108 111 114 117 120
11 0.025 67 70 73 76 80 83 86 90 93 97 100 104 107 111 114 118 122 125 129
11 0.05 68 72 75 79 83 86 90 94 98 101 105 109 113 117 121 124 128 132 136
11 0.1 70 74 78 82 86 90 94 98 103 107 111 115 119 124 128 132 136 140 145

125
Mann-Whitney lower quantiles wp
To get w1-p, use formula w1-p=n1(n1+n2+1)-wp

n1 prob n2=2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
12 0.001 78 78 79 81 83 86 88 91 93 96 99 102 104 107 110 113 116 119 121
12 0.005 78 80 82 85 88 91 94 97 100 103 106 110 113 116 120 123 126 130 133
12 0.01 78 81 84 87 90 93 96 100 103 107 110 114 117 121 125 128 132 135 139
12 0.025 80 83 86 90 93 97 101 105 108 112 116 120 124 128 132 136 140 144 148
12 0.05 81 84 88 92 96 100 105 109 113 117 121 126 130 134 139 143 147 151 156
12 0.1 83 87 91 96 100 105 109 114 118 123 128 132 137 142 146 151 156 160 165
13 0.001 91 91 93 95 97 100 103 106 109 112 115 118 121 124 127 130 134 137 140
13 0.005 91 93 95 99 102 105 109 112 116 119 123 126 130 134 137 141 145 149 152
13 0.01 92 94 97 101 104 108 112 115 119 123 127 131 135 139 143 147 151 155 159
13 0.025 93 96 100 104 108 112 116 120 125 129 133 137 142 146 151 155 159 164 168
13 0.05 94 98 102 107 111 116 120 125 129 134 139 143 148 153 157 162 167 172 176
13 0.1 96 101 105 110 115 120 125 130 135 140 145 150 155 160 166 171 176 181 186
14 0.001 105 105 107 109 112 115 118 121 125 128 131 135 138 142 145 149 152 156 160
14 0.005 105 107 110 113 117 121 124 128 132 136 140 144 148 152 156 160 164 169 173
14 0.01 106 108 112 116 119 123 128 132 136 140 144 149 153 157 162 166 171 175 179
14 0.025 107 111 115 119 123 128 132 137 142 146 151 156 161 165 170 175 180 184 189
14 0.05 108 113 117 122 127 132 137 142 147 152 157 162 167 172 177 183 188 193 198
14 0.1 110 116 121 126 131 137 142 147 153 158 164 169 175 180 186 191 197 203 208
15 0.001 120 120 122 125 128 131 135 138 142 145 149 153 157 161 164 168 172 176 180
15 0.005 120 123 126 129 133 137 141 145 150 154 158 163 167 172 176 181 185 190 194
15 0.01 121 124 128 132 136 140 145 149 154 158 163 168 172 177 182 187 191 196 201
15 0.025 122 126 131 135 140 145 150 155 160 165 170 175 180 185 191 196 201 206 211
15 0.05 124 128 133 139 144 149 154 160 165 171 176 182 187 193 198 204 209 215 221
15 0.1 126 131 137 143 148 154 160 166 172 178 184 189 195 201 207 213 219 225 231
16 0.001 136 136 139 142 145 148 152 156 160 164 168 172 176 180 185 189 193 197 202
16 0.005 136 139 142 146 150 155 159 164 168 173 178 182 187 192 197 202 207 211 216
16 0.01 137 140 144 149 153 158 163 168 173 178 183 188 193 198 203 208 213 219 224
16 0.025 138 143 148 152 158 163 168 174 179 184 190 196 201 207 212 218 223 229 235
16 0.05 140 145 151 156 162 167 173 179 185 191 197 202 208 214 220 226 232 238 244
16 0.1 142 148 154 160 166 173 179 185 191 198 204 211 217 223 230 236 243 249 256
17 0.001 153 154 156 159 163 167 171 175 179 183 188 192 197 201 206 211 215 220 224
17 0.005 153 156 160 164 169 173 178 183 188 193 198 203 208 214 219 224 229 235 240
17 0.01 154 158 162 167 172 177 182 187 192 198 203 209 214 220 225 231 236 242 247
17 0.025 156 160 165 171 176 182 188 193 199 205 211 217 223 229 235 241 247 253 259
17 0.05 157 163 169 174 180 187 193 199 205 211 218 224 231 237 243 250 256 263 269
17 0.1 160 166 172 179 185 192 199 206 212 219 226 233 239 246 253 260 267 274 281
18 0.001 171 172 175 178 182 186 190 195 199 204 209 214 218 223 228 233 238 243 248
18 0.005 171 174 178 183 188 193 198 203 209 214 219 225 230 236 242 247 253 259 264
18 0.01 172 176 181 186 191 196 202 208 213 219 225 231 237 242 248 254 260 266 272
18 0.025 174 179 184 190 196 202 208 214 220 227 233 239 246 252 258 265 271 278 284
18 0.05 176 181 188 194 200 207 213 220 227 233 240 247 254 260 267 274 281 288 295
18 0.1 178 185 192 199 206 213 220 227 234 241 249 256 263 270 278 285 292 300 307
19 0.001 190 191 194 198 202 206 211 216 220 225 231 236 241 246 251 257 262 268 273
19 0.005 191 194 198 203 208 213 219 224 230 236 242 248 254 260 265 272 278 284 290
19 0.01 192 195 200 206 211 217 223 229 235 241 247 254 260 266 273 279 285 292 298
19 0.025 193 198 204 210 216 223 229 236 243 249 256 263 269 276 283 290 297 304 310
19 0.05 195 201 208 214 221 228 235 242 249 256 263 271 278 285 292 300 307 314 321
19 0.1 198 205 212 219 227 234 242 249 257 264 272 280 288 295 303 311 319 326 334
20 0.001 210 211 214 218 223 227 232 237 243 248 253 259 265 270 276 281 287 293 299
20 0.005 211 214 219 224 229 235 241 247 253 259 265 271 278 284 290 297 303 310 316
20 0.01 212 216 221 227 233 239 245 251 258 264 271 278 284 291 298 304 311 318 325
20 0.025 213 219 225 231 238 245 252 259 266 273 280 287 294 301 309 316 323 330 338
20 0.05 215 222 229 236 243 250 258 265 273 280 288 295 303 311 318 326 334 341 349
20 0.1 218 226 233 241 249 257 265 273 281 289 297 305 313 321 330 338 346 354 362

126
References
Keller, G. (2012), Managerial Statistics, 9th edn, South-Western Cengage Learning,
Victoria.

Moore, D. (2000), The Basic Practice of Statistics, 2nd edn, W.H. Freeman and
Company, New York.

Navidi, W. (2006), Statistics for Engineers and Scientists, McGraw-Hill, New York.

Wackerly, D., Mendenhall, W. and Scheaffer, R. (2002), Mathematical Statistics with


Applications, 6th edn, Duxbury, Pacific Grove.

Wonnacott, T. and Wonnacott, R. (1990), Introductory Statistics for Business and


Economics, Wiley, New York.

127

You might also like