Professional Documents
Culture Documents
Objectives
Test hypotheses and construct confidence intervals
about the difference in two population means using
about the difference in two population means using
the Z statistic.
Test hypotheses and construct confidence intervals
Test hypotheses and construct confidence intervals
about the difference in two population means using
the t statistic.
Learning Objectives
Test hypotheses and construct confidence intervals
about the difference in two related populations
about the difference in two related populations.
Test hypotheses and construct confidence intervals
about the differences in two population proportions.
about the differences in two population proportions.
Test hypotheses and construct confidence intervals
about two population variances using the F statistic.
Hypothesis Testing; Confidence Intervals ‐
Difference in Means using z Statistic
Difference in Means using z Statistic
(Population Variances Known)
Calculating two sample means and using the
difference in the two sample means is used to
test the difference in the population
test the difference in the population
The central limit theorem states that the difference
in two sample means is normally distributed for large
p y g
sample sizes ((both n1 and n2) > 30) regardless of the
shape of the population
Hypothesis Testing for Differences
Between Means: The Growth Example
Between Means: The Growth Example
As a specific example, suppose we want to conduct a
hypothesis test to determine whether the average annual
hypothesis test to determine whether the average annual
growth for an animal species is different from the average
annual growth μ1 of another species μ2. Because we are testing
to determine whether the means are different, it might seem
logical that the null and alternative hypotheses would be
Ho: μ1 = μ2
Ha: μ1 ≠ μ2
Hypothesis Testing for Differences
Between Means:
Between Means:
H0 : 1 2
H a : 1 2
=0.05, /2 = 0.025, z0.025 = 1.96
The hypotheses can also be expressed as:
H0 : 1 2 0
H a : 1 2 0
Analysis is testing whether there is a difference in the
Analysis is testing whether there is a difference in the
annual growth. This is a two tailed test.
Hypothesis Testing for Differences
Between Means
Between Means
If z < - 1.96 1 96 reject Ho.
1 96 or z > 1.96,
Rejection
Rejection Region If - 1.96 z 1.96, do not reject Ho.
Region
. 025 .025
2 2
Non Rejection Region
Z c
1.96
0 Z c
1.96
Critical Values
Hypothesis Testing for Differences Between
Means:
Species 1 n1
32 Species 2
74.256 57.791 71.115
x 1
70.700 69.962 77.136 43.649
1 16.253
96.234 65.145 67.574 55.052 66.035 63.369
89.807 96.767 59.621 57.828 54.335 59.676
264 .164
2
93.261 77.242 62.483 1
63.362 42.494 54.449
0
Ho :
1 2
Rejection
Region
H : 0
Rejection
a Region
1 2
.025 .025
2 2
Non Rejection Region
X 1
X2 X X
1 2
Critical Values
Hypothesis Testing for Differences
Between Means
Between Means
If z 1.96 o
or z 1.96, reject
eject H 0 .
Rejection
Rejection Region If 1.96 z 1.96, do not reject H 0 .
Region
( x1 x2 ) ( 1 2)
z
12 22
.025
2
. 025 2 n1 n2
Non Rejection Region
(70.700- 62.187) - (0)
2.35
Z 2.33 Z 2.33 264.164 166.411
0
c c
Critical Values 32 34
Since z 2.35 1.96, reject H 0 .
Demonstration Problem
A sample of 87 men showed that the average calcium
depletion per year is 3352 µg The population standard
depletion per year is 3352 µg. The population standard
deviation is 1100 µg. A sample of 76 women showed that the
average calcium depletion per year is 5727 µg, with a
population standard deviation of 1700 µg. A researcher wants
to “prove” that women lose more calcium. If they use α = .001
and these sample data, will they be able to reject a null
p , y j
hypothesis that women annually lose as much (or less) calcium
as men do?
Demonstration Problem
Rejection
Ho : 1 2 0
Region
Ha : 1 2 0 .001
001
Non Rejection Region
Z c
3
3.08 0
Critical Value
Demonstration Problem
men Women
x1 $3,352 x2 $5,727
1 $1,100 2 $1,700
Rejection
Region n1 87 n2 76
.001 x x
z
1 2 1 2
2 2
Non Rejection Region
1 2
Z 308
c
.
0 n n 1 2
Critical Value
3352 5727 0 10.42
2 2
1100 1700
If z < - 3.08, reject Ho. 87 76
The evidence is substantial that women, on
h d b l h
average, lose more calcium than men.
Confidence Interval
Sometimes the solution(s) is/are to take a random
sample from each of the two populations and study
sample from each of the two populations and study
the difference in the two samples.
Formula for confidence interval to estimate (µ1 ‐ µ2).
Formula for confidence interval to estimate (µ
Designating a group as group one, and another as
group two is an arbitrary decision.
Demonstration Problem
men Women
x1 $3,352 x2 $5,727
1 $1,100 2 $1,700 95 % Confidence z = 1.96
n1 87 n2 76
x x z
2 2 2 2
1 2
1 2 x1 x 2 z 1 2
n n n n
1 2
1 2 1 2
Calculate it!
Hypothesis Testing
Hypothesis test ‐ compares the means of two
samples to see if there is a difference in the two
samples to see if there is a difference in the two
population means from which the sample comes.
This is used when σ2 is unknown and samples are
p
independent.
Assumes that the measurement is normally distributed.
Hypothesis Testing
If σ is unknown, it can be estimated by pooling the
two sample variances and computing a pooled sample
two sample variances and computing a pooled sample
standard deviation
t Test for Differences in Population Means
Each of the two populations is normally distributed.
Th
The two samples are independent.
l i d d
The values of the population variances are unknown.
Th
The variances of the two populations are equal.
i f th t l ti l
12 = 22
t Formula to Test the Difference in
Means Assuming 12 =
Means Assuming = 22
( x1 x 2 ) ( 1 2 )
t
s12 ( n1 1) s 22 ( n 2 1) 1 1
n1 n 2 2 n1 n 2
Shrimp weights
Hatching method A
Hatching method A Hatching method B
Hatching method B
56 50 52 44 52 59 54 55 65
47 47 53 45 48 52 57 64 53
42 51 42 43 44 53 56 53 57
Ho: 1 2 0
Ha: 1 2 0 Rejection
R i
Region
Rejection
Region
.05 .025
.025 2
2 2
df n1 n2 2 15 12 2 25 .025
2
t 0.25 , 25 2.060 Non Rejection Region
t
.025,25
2060
.
t 2060
.
0 .025,25
If t < - 2.060 or t > 2.060, reject Ho.
Critical Values
If - 2.060 t 2.060, do not reject Ho.
Shrimp hatching methods
Hatching Method A Hatching Method B
56 51 45
59 57 53
47 52 43
52 56 65
42 53 52
53 55 53
50 42 48
54 64 57
47 44 44
n1 15 n2 12
x1 47.73 x2 56.5
s 199.495
2
1 95 s 18.273
2
2
Shrimp hatching methods
( x1 x 2 ) ( 1 2 ) ( x1 x 2 ) ( 1 2 )
t
s 12 s 22 s 12 ( n 1 1) s 22 ( n 2 1) 1 1
n1 n2 n1 n 2 2 n1 n2
4 7 .7 3 5 6 .5 0 0
1 9 .4 9 5 1 4 1 8 .2 7 3 1 1 1
1
15 12 2 15 12
5 .2 0
2
s 12 s 22
n
n 2
d f 1
2 2
2 5
s 12 s 22
n1 n 2
n1 1 n 2 1
The conclusion is that there is a significant
difference in the effectiveness of the hatching
methods.
Confidence Interval to Estimate 1 ‐ 2
when 12 and 22 are unknown and
when are unknown and 12 = = 22
2
s (n1 1) s (n2 1) 1 1
2
( x1 x2 ) t
1 2
n1 n2 2 n1 n2
where df n1 n2 2
Demonstration Problem
A coffee manufacturer is interested in estimating the difference in
the average daily coffee consumption of regular coffee drinkers and
the average daily coffee consumption of regular‐coffee drinkers and
decaffeinated‐coffee drinkers. Its researcher randomly selects 13
regular‐coffee drinkers and asks how many cups of coffee per day
th d i k H
they drink. He randomly locates 15 decaffeinated‐coffee drinkers and
d l l t 15 d ff i t d ff d i k d
asks how many cups of coffee per day they drink. The average for the
regular‐coffee drinkers is 4.35 cups, with a standard deviation of 1.20
cups. The average for the decaffeinated‐coffee drinkers is 6.84 cups,
Th f h d ff i d ff d i k i 6 84
with a standard deviation of 1.42 cups. The researcher assumes, for
each population, that the daily consumption is normally distributed,
and he constructs a 95% confidence interval to estimate the
difference in the averages of the two populations.
Demonstration Problem
n1 13, n2 15
x1 4.35, x2 6.84
s1 1.20, s2 1.42
0.05, t0.025, 26 2.056
Demonstration Problem
2.49 1.03
3.52 1 - 2 1.46
The researcher is 95% confident that the difference in population
average daily consumption of cups of coffee between regular‐
d il ti f f ff b t l and d
decaffeinated‐coffee drinkers is between 1.46 cups and 3.52 cups.
Statistical Inferences for Two
Related Populations
Related Populations
Dependent samples
Used in before and after studies
After measurement is not independent of the before
measurement
Hypothesis Testing
Researcher must determine if the two samples are
related to each other
related to each other
The technique for related samples is different from the
technique used to analyze independent samples
Matched pairs test requires the two samples be the
same size
Dependent Samples
Before and after measurements on the same
individual
Individual Before After
Studies of twins
Studies of spouses
Studies of spouses 1 32 39
2 11 15
3 21 35
4 17 13
5 30 41
6 38 39
7 14 22
Hypothesis Testing
The following t test for dependent measures uses the
sample difference d between individual matched
sample difference, d, between individual matched
samples as the basic measurement of analysis
An analysis of d
y converts the problem from a two sample
p p
problem to a single sample of differences
Formulas for Dependent Samples
d D d
t d
sd n
n (d d )2
sd
n 1
dff n 1
( d )2
n number of pairs d 2
n
d = sample difference in pairs n 1
Analysis of data by this method involves calculating a
t value with a critical value obtained from the table
value with a critical value obtained from the table
n in the degrees of freedom (n – 1) is the number of
matched pairs of scores
W/H Ratios for Nine Randomly
Selected Ethnic Groups
Selected Ethnic Groups
Suppose a stock market investor is interested in
determining whether there is a significant difference
determining whether there is a significant difference
in the W/H (weight to height) ratio for 2 year old
children of different ethnic groups in Vietnam. In an
children of different ethnic groups in Vietnam. In an
effort to study this question, the investor randomly
samples nine ethnic groups from Vietnam and
records the W/H ratios for each of these groups at
the end of year 1 and at the end of year 2.
W/H Ratios for Nine Randomly
Selected Groups
Selected Groups
Year 1 Year 2
Groups W/H Ratio W/HRatio
1 8.9 12.7
2 38.1 45.4
3 43.0 10.0
4 34.0 27.2
5 34.5 22.8
6 15.2 24.1
7 20 3
20.3 32 3
32.3
8 19.9 40.1
9 61.9 106.5
Hypothesis Testing with Dependent
Samples: W/H Ratios for Nine groups
Samples: W/H Ratios for Nine groups
Ho : D 0
Ha : D 0 Rejection
Region
Rejection
Region
.01
df n 1 9 1 8 .005 2
.005
2
t.005,8 3.355 Non Rejection Region
H0: D = 0 d 5 . 033
H1: D 0 s d 21 . 599
5 . 033 0
t 0 . 70
21 . 599
9
Year 1 Year 2
W/H Ratio W/H Ratio
Mean 30.64 35.68
Variance 268.1 837.5
Observations 9 9
Pearson Correlation 0.674
Hypothesized Mean Difference 0
df 8
t Stat -0.7
P(T<=t) one-tail 0.252
t Critical one-tail 1.86
P(T<=t) two-tail 0.504
t Critical two-tail 2.306
Confidence Intervals
Researcher can be interested in estimating the mean
difference in two populations for related samples
difference in two populations for related samples
This requires a confidence interval of D (the mean
population difference of two related samples) to be
constructed
Confidence Intervals for Mean
Difference for Related Samples
Difference for Related Samples
d t s d
D d t s d
n n
df n 1
Difference in Number Bacteria Colonies
strain Without treatment With treatment d
d 3.39
1 8 11 ‐3
2 19 30 ‐11
3 5 6 ‐1
4 9 13 ‐4
sd 3.27 5
6
7
3
0
13
5
4
15
‐2
‐4
‐2
2
8 11 17 ‐6
9 9 12 ‐3
10 5 12 ‐7
11 8 6 2
12 2 5 ‐3
13 11 10 1
14 14 22 ‐8
15 7 8 ‐1
16 12 15 ‐3
17 6 12 ‐6
6
18 10 10 0
Confidence Interval for Mean Difference
in Number of bacteria colonies
in Number of bacteria colonies
df n 1 18 1 17
t . 005 , 17 2 . 898
d t s d
D d t s d
n n
3 . 27 3 . 27
3 . 39 2 . 898 D 3 . 39 2 . 898
18 18
3 . 39 2 . 23 D 3 . 39 2 . 23
5 . 62 D 1 . 16
The analyst estimates with a 99% level of confidence that the
The analyst estimates with a 99% level of confidence that the
average difference in the number of bacteria colonies with
and without treatment is between ‐5.62 and ‐1.16 houses.
Statistical Inference about two
Population Proportions ( pp̂ – pp̂ )
Population Proportions ( – 1 2
pˆ pˆ
Sample proportion used is ( ) 1 2
n size of sample 1
1
n1 n2 n size of sample 2
2
q 1- p
1 1
q 1- p
2 2
Hypothesis Testing
Because population proportions are unknown,
an estimate of the Std Dev of the difference in two
an estimate of the Std Dev of the difference in two
sample proportions is made by using sample
proportions as point of estimates of the population
proportion
Z Formula to Test the Difference
in Population Proportions
in Population Proportions
Z
pˆ 1
pˆ 2
p 1
p 2
1
p q
1
n 1 n 2
P x 1
x 2
n n
1 2
n pˆ n pˆ
1 1 2 2
n n 1 2
q 1 p
Testing the Difference in Population
Proportions
pp
Ho :
1 2
0
Rejection
H :pp
Region
a 0 Rejection
Region
1 2
.005
.01 2
.005
005
.
2 2 2
z.005 2.575 Non Rejection Region
If z < - 2.575
2 575 or z > 2.575,
2 575 reject
j H o.
Z 2575
c
. 0 Z 2575
c
.
Critical Values
If - 2.575 z 2.575, do not reject H o.
Testing the Difference in Population
Proportions
n 100
1 n2
95
z
ˆ pˆ p p
p1 2 1 2
x 24
1 x2
39 1
p q
1
24
pˆ 100 .24
pˆ
39
.41 n1 n 2
1 2 95
.24 . 41 0
P x 1
x 2 1
. 323 . 677
1
n 1
n 2 100 95
24 39 . 17
100 95 . 067
. 323 2 . 54
2. n qˆ 5, 1 1
3. n pˆ 5, and
2 2
4. n qˆ 5 where
2
qˆ = 1 - pˆ
2
p q p q
σ pˆ 1 pˆ 2
1 1
2 2
n 1 n 2
Confidence Interval to Estimate p1 ‐ p2
pˆ pˆ z pˆ qˆ pˆ qˆ z pˆ qˆ pˆ qˆ
1 1 2
p p pˆ pˆ
2 1 1 2 2
1 2
n n1 2
1 2
n n
1 2
1 2
Example Problem:
n 400
1 n 480
2
x 48
1 x 187
2
48
ˆp 400
1
.12 187
pˆ 480 .39
2
qˆ 1 pˆ .88
1 1
qˆ 2
1 pˆ 2
.61 For a 98% level of confidence, z = 2.33.
pˆ pˆ Z pˆ qˆ pˆ qˆ pˆ pˆ Z pˆ qˆ pˆ qˆ
1
1 2 2
pp 1
1 2 2
1 2
n n
1 2
1 2 1 2
n 1 n 2
.12 .39 2.33 .12.88 .39.61 p1 p2 .12 .39 2.33 .12.88 .39.61
400 480 400 480
.27 .064 pp 1 2
.27 .064
.334 p p .206
1 2
F Test for Two Population Variances
s12
F 2
s2
df num erator v1 n1 1
d
e
n
o
m
i
n
a
t
o
r
df v2 n2 1
F distribution
di ib i
Sheet Metal Example
Suppose a machine produces metal sheets that are specified to be 22
millimeters thick Because of the machine the operator the raw
millimeters thick. Because of the machine, the operator, the raw
material, the manufacturing environment, and other factors, there is
variability in the thickness. Two machines produce these sheets.
Operators are concerned about the consistency of the two machines To
Operators are concerned about the consistency of the two machines. To
test consistency, they randomly sample 10 sheets produced by machine
1 and 12 sheets produced by machine 2. The thickness measurements of
sheets from each machine are given in the table on the following page
sheets from each machine are given in the table on the following page.
Assume sheet thickness is normally distributed in the population.
How can we test to determine whether the variance from each sample
How can we test to determine whether the variance from each sample
comes from the same population variance (population variances are
equal) or from different population variances (population variances are
not equal)?
not equal)?
Sheet Metal Example: Hypothesis Test for
Equality of Two Population Variances
Equality of Two Population Variances
0.05 F.025,9,11 3.59
Ho : 12 22
n1 10 1
Ha : 2
1
2
2 n2 12
F . 05 , 9 , 11 =
F . 05 , 9 , 11
s12
1
F 2
s2 3 . 59
0 . 28
df numerator 1 n1 1
If F < 0.28
0 28 or F > 3.59
3 59, reject Ho.
df deno min ator 2 n2 1 If 0.28 F 3.59, do reject Ho.
Sheet Metal Example
Machine 1 Machine 2
22.3
22 3 21.8
21 8 22.2
22 2 22.0
22 0 22.2
22 2 22.0
22 0
21.8 21.9 21.6 22.1 22.0 22.1
22.3 22.4 21.8 21.7 21.9
21.6 22.5 21.9 21.9 22.1
n1 10 n 2 12
Fs
2
0.1138
1
5.63
s12 0 . 1138 s s 22 0 . 0202
2
0.0202
2