Lecture 5 PDF

Learning Objectives Learning Objectives
Test hypotheses and construct confidence intervals Test hypotheses and construct confidence intervals
about the difference in two population means using about the difference in two related populations.
the Z statistic. Test hypotheses and construct confidence intervals
Test hypotheses and construct confidence intervals about the differences in two population proportions.
about the difference in two population means using yp
Test hypotheses and construct confidence intervals
the t statistic. about two population variances using the F statistic.
Hypothesis Testing; Confidence Intervals ‐ Hypothesis Testing for Differences
Difference in Means using z Statistic Between Means: The Growth Example
(Population Variances Known) As a specific example, suppose we want to conduct a
hypothesis test to determine whether the average annual
Calculating two sample means and using the
growth for an animal species is different from the average
difference in the two sample means is used to
annual growth μ1 of another species μ2. Because we are testing
test the difference in the population to determine whether the means are different, it might seem
The central limit theorem states that the difference
The central limit theorem states that the difference logical that the null and alternative hypotheses would be
logical that the null and alternative hypotheses would be
in two sample means is normally distributed for large
sample sizes ((both n1 and n2) > 30) regardless of the Ho: μ1 = μ2
shape of the population Ha: μ1 ≠ μ2
Page 1
Hypothesis Testing for Differences Hypothesis Testing for Differences
Between Means: Between Means
H0 : 1  2 If z < - 1.96 or z > 1.96, reject Ho.
H a : 1  2
Rejection
Rejection Region If - 1.96  z  1.96, do not reject Ho.
Region
 =0.05, /2 = 0.025, z0.025 = 1.96

 
 . 025  .025
The hypotheses can also be expressed as: 2 2
H0 : 1  2  0 Non Rejection Region
H a : 1  2  0
Z c  1.96 0 Z c
 1.96
Critical Values
Analysis is testing whether there is a difference in the
annual growth. This is a two tailed test.
Hypothesis Testing for Differences Between Hypothesis Testing for Differences
Means: Between Means:
n  32
z  (70.700  62.187)  (0)  2.35
Species 1 1 Species 2
74.256 57.791 71.115
x 1
 70.700 69.962 77.136 43.649
264.160  166.411

96.234 65.145 67.574 55.052 66.035 63.369
1
 16.253
89.807 96.767 59.621 57.828 54.335 59.676
93.261 77.242 62.483  1

2
 264 .164 63.362 42.494 54.449 32 34
103 030
103.030 67 056
67.056 69 319
69.319 37.194 83.849 46.394
74.195 64.276 35.394 n 2

 34 99.198 67.160 71.804 Since the observed value of 2.35 is greater than 1.96,
61.254 37.386 72.401 reject the null hypothesis. That is, there is a significant
75.932 74.194 86.741
x 2
 62 .187
73.065 59.505 56.470
difference between the average annual growth of species 1

80.742 65.360 57.351
2
 12 .900 48.036 72.790 67.814
39.672 73.904 and the average annual growth of species 2.
  166 .411
2 60.053 71.351 71.492
45.652 54.270 2
66.359 58.653
93.083 59.045
61.261 63.508
63.384 68.508
Page 2
Hypothesis Testing for Differences Hypothesis Testing for Differences
Between Means Between Means
 0
Ho :
1 2
Rejection Rejection
If z  1.96 or z  1.96, reject H 0 .
If  1.96  z  1.96, do not reject H 0 .
Region
H :    0
Rejection Region
Rejection Region
a
1 2 Region
( x1  x2 )  (1  2)
z
  12  22
    .025 
2
 .025
2
 .025
2
 . 025 2 n1 n2
Non Rejection Region
Non Rejection Region (70.700- 62.187)- (0)
  2.35
Z  233
. Z  233
. 264.164 166.411
X  X2 X X
1 2
c 0 c

1
Critical Values
Critical Values 32 34
Since z  2.35  1.96, reject H 0 .
Demonstration Problem Demonstration Problem
A sample of 87 men showed that the average calcium
depletion per year is 3352 µg. The population standard
deviation is 1100 µg. A sample of 76 women showed that the
average calcium depletion per year is 5727 µg, with a
Rejection
population standard deviation of 1700 µg. A researcher wants
Ho :  1   2  0
Region
to “prove”
to prove that women lose more calcium. If they use α
that women lose more calcium If they use α = .001
= 001
and these sample data, will they be able to reject a null
hypothesis that women annually lose as much (or less) calcium
Ha :  1   2  0  .001
as men do?
Z c
 3.08 0
Critical Value
Page 3
men Women
x1  $3,352 x2  $5,727
The evidence is substantial that women, on
 1  $1,100  2  $1,700
Rejection average, lose more calcium than men.
Region n1  87 n2  76
 .001
001 x  x     
z
1 2 1 2
 
2 2
1 2
Z  308
c
.
0 n n 1 2
Critical Value

3352  5727  0  10.42
2 2
1100  1700
If z < - 3.08, reject Ho. 87 76
If z   3.08, do not reject Ho. Since z = - 10.42 < - 3.08, reject H o .
Confidence Interval Demonstration Problem
men Women
Sometimes the solution(s) is/are to take a random
sample from each of the two populations and study x1  $3,352 x2  $5,727
the difference in the two samples.  1  $1,100  2  $1,700 95 % Confidence  z = 1.96
Formula for confidence interval to estimate (µ1 ‐ µ2). n1  87 n2  76
Designating a group as group one, and another as
g g g p g p ,
group two is an arbitrary decision.
x  x   z     
2 2 2 2
1 2
 1   2   x1  x 2   z 1 2
n n n n
1 2
1 2 1 2
Calculate it!
Page 4
Hypothesis Testing Hypothesis Testing
Hypothesis test ‐ compares the means of two If σ is unknown, it can be estimated by pooling the

samples to see if there is a difference in the two two sample variances and computing a pooled sample
population means from which the sample comes. standard deviation
This is used when σ2 is unknown and samples are
independent.
Assumes that the measurement is normally distributed.
A h h i ll di ib d
t Test for Differences in Population Means t Formula to Test the Difference in
Means Assuming 12 = 22
Each of the two populations is normally distributed.
The two samples are independent.
The values of the population variances are unknown.
( x1  x 2 )  (  1   2 )
The variances of the two populations are equal. t
12 =
=  22 s ( n1  1)  s 22 ( n 2  1)
2
1 1
1

n1  n 2  2 n1 n 2
Page 5
Shrimp weights
Hatching method A Hatching method B Ho: 1  2  0

56 50 52 44 52 59 54 55 65
47 47 53 45 48 52 57 64 53 Ha: 1  2  0 Rejection
Region
Rejection
Region
42 51 42 43 44 53 56 53 57

 .05 .025
  .025 2
2 2 
df  n1  n2  2  15  12  2  25 .025
2
t 0.25 , 25  2.060 Non Rejection Region
t.025,25
 2060
.
t  2060
.
0 .025,25
If t < - 2.060 or t > 2.060, reject Ho.
Critical Values
If - 2.060  t  2.060, do not reject Ho.
Shrimp hatching methods Shrimp hatching methods
( x1  x 2 )  (  1   2 ) ( x1  x 2 )  (  1   2 )
t  
s 12 s2 s 12 ( n 1  1)  s 22 ( n 2  1) 1 1
 2 
Hatching Method A Hatching Method B n1 n2 n1  n 2  2 n1 n2
56 51 45
59 57 53 
 4 7 .7 3  5 6 .5 0   0
47 52 43
52 56 65
1 9 .4 9 5 1 4   1 8 .2 7 3 1 1  1

1
42 53 52 15  12  2 15 12
53 55 53
50 42 48   5 .2 0
54 64 57 2
47 44 44  s 12 s 22 
 n 
 n 2 
d f  1
 2 5
n1  15 n2  12
2 2
 s1 
2
 s 22 
   
 n1   
n 2 
x2  56.5
n1  1 n 2  1
x1  47.73
If t < - 2.060 or t > 2.060, reject H o.
s  19.495
2 s22  18.273 Since t = -5.20 < -2.060, reject H o .
1 If - 2.060  t  2.060, do not reject H o.
Page 6
Shrimp hatching methods Confidence Interval to Estimate 1 ‐ 2
when 12 and 22 are unknown and 12 = 22
The conclusion is that there is a significant
difference in the effectiveness of the hatching
methods.
s12 ( n1  1)  s22 (n2  1) 1 1
( x1  x2 )  t 
n1  n2  2 n1 n2
where df  n1  n2  2
A coffee manufacturer is interested in estimating the difference in
the average daily coffee consumption of regular‐coffee drinkers and
decaffeinated‐coffee drinkers. Its researcher randomly selects 13 n1  13, n2  15
regular‐coffee drinkers and asks how many cups of coffee per day
they drink. He randomly locates 15 decaffeinated‐coffee drinkers and x1  4.35, x2  6.84
asks how many cups of coffee per day they drink. The average for the
y p p y y g
regular‐coffee drinkers is 4.35 cups, with a standard deviation of 1.20
cups. The average for the decaffeinated‐coffee drinkers is 6.84 cups, s1  1.20, s2  1.42
  0.05, t0.025, 26  2.056
with a standard deviation of 1.42 cups. The researcher assumes, for
each population, that the daily consumption is normally distributed,
and he constructs a 95% confidence interval to estimate the
difference in the averages of the two populations.
Page 7
Demonstration Problem Statistical Inferences for Two
Related Populations
(1.20) 2 (12)  (1.42) 2 (14) 1 1
(4.35  6.84)  2.056  Dependent samples
13  15  2 13 15
Used in before and after studies
After measurement is not independent of the before
 2.49  1.03 measurement
 3.52  1 -  2  1.46
The researcher is 95% confident that the difference in population
average daily consumption of cups of coffee between regular‐ and
decaffeinated‐coffee drinkers is between 1.46 cups and 3.52 cups.
Hypothesis Testing Dependent Samples
Researcher must determine if the two samples are Before and after measurements on the same
related to each other individual
The technique for related samples is different from the Individual Before After
Studies of twins
technique used to analyze independent samples
Studies of spouses 1 32 39
Matched pairs test requires the two samples be the 2 11 15
same size
3 21 35
4 17 13
5 30 41
6 38 39
7 14 22
Page 8
Hypothesis Testing Formulas for Dependent Samples
The following t test for dependent measures uses the
t
d D
d 
 d
sample difference, d, between individual matched sd n
samples as the basic measurement of analysis n  (d  d )2
sd 
n 1
An analysis of d converts the problem from a two sample df  n  1
problem to a single sample of differences ( d )2
n  number of pairs  d 
2
 n
d = sample difference in pairs n 1
D = mean population difference

st = standard deviation of sample difference
d = mean sample difference
Hypothesis Testing W/H Ratios for Nine Randomly
Selected Ethnic Groups
Analysis of data by this method involves calculating a Suppose a stock market investor is interested in
t value with a critical value obtained from the table determining whether there is a significant difference
n in the degrees of freedom (n – 1) is the number of in the W/H (weight to height) ratio for 2 year old
matched pairs of scores children of different ethnic groups in Vietnam. In an
effort to study this question, the investor randomly
y q , y
samples nine ethnic groups from Vietnam and
records the W/H ratios for each of these groups at
the end of year 1 and at the end of year 2.
Page 9
W/H Ratios for Nine Randomly Hypothesis Testing with Dependent
Selected Groups Samples: W/H Ratios for Nine groups
Year 1 Year 2 Ho : D  0
Groups W/H Ratio W/HRatio
1 8.9 12.7 Ha : D  0 Rejection
Region
Rejection
Region
2 38.1 45.4
3 43 0
43.0 10 0
10.0   .01
 
4 34.0 27.2 df  n 1  9 1  8 .005 2
.005
2
5 34.5 22.8
t.005,8  3.355 Non Rejection Region
6 15.2 24.1
7 20.3 32.3 If t < - 3.355or t > 3.355, reject Ho. t.01,11
 3.355 0 t
.01,11
 3.355
If - 3.355  t  3.355,do not reject Ho.

8 19.9 40.1 Critical Value
9 61.9 106.5
Hypothesis Testing with Dependent Hypothesis Testing with Dependent
Samples: W/H Ratios for Nine groups Samples: W/H Ratios for Nine groups
t-Test: Paired Two Sample for Means
H0 : D = 0 d   5 . 033
H1 : D  0 s d  21 . 599 Year 1 Year 2
W/H Ratio W/H Ratio
 5 . 033  0
t   0 . 70 Mean 30.64 35.68
21 . 599 Variance 268.1 837.5
Ob
Observations
ti 9 9
9
Pearson Correlation 0.674
Since - 3.355  t = - 0 .70  3.355 , do not reject H o Hypothesized Mean Difference 0

df 8
t Stat -0.7
P(T<=t) one-tail 0.252
t Critical one-tail 1.86
P(T<=t) two-tail 0.504
t Critical two-tail 2.306
Page 10
Confidence Intervals Confidence Intervals for Mean
Difference for Related Samples
Researcher can be interested in estimating the mean
difference in two populations for related samples
This requires a confidence interval of D (the mean
population difference of two related samples) to be d t s d
 D  d t s d
constructed
n n
df  n  1
Difference in Number Bacteria Colonies Confidence Interval for Mean Difference
strain Without treatment With treatment d
in Number of bacteria colonies
d  3.39
1 8 11 ‐3
2 19 30 ‐11
3 5 6 ‐1
df  n  1  18  1  17
4 9 13 ‐4 t . 005 , 17  2 . 898
sd  3.27 5
6
7
3
0
13
5
4
15
‐2
‐4
‐2
d  t s d
n
 D  d  t s d
n
3 . 27 3 . 27
8 11 17 ‐6  3 . 39  2 . 898  D   3 . 39  2 . 898
9 9 12 ‐3 18 18
10 5 12 ‐7  3 . 39  2 . 23  D   3 . 39  2 . 23
11 8 6 2  5 . 62  D   1 . 16
12 2 5 ‐3
13 11 10 1
14 14 22 ‐8 The analyst estimates with a 99% level of confidence that the
15 7 8 ‐1 average difference in the number of bacteria colonies with
16 12 15 ‐3 and without treatment is between ‐5.62 and ‐1.16 houses.
17 6 12 ‐6
18 10 10 0
Page 11
Statistical Inference about two Hypothesis Testing
p̂ p̂ )
Population Proportions ( – 1 2
pˆ  pˆ
Sample proportion used is ( ) 1 2 Because population proportions are unknown,
an estimate of the Std Dev of the difference in two
( pˆ1  pˆ 2 )  ( p1  p2 ) pˆ  proportion from sample 1
1 sample proportions is made by using sample
z pˆ  proportion from sample 2 proportions as point of estimates of the population
p1  q1 p2  q2 2
 n  size of sample 1
1
proportion
n1 n2 n  size of sample 2
2
p  proportion from population 1

1
p  proportion from population 2

2
q  1- p
1 1
q  1- p
2 2
Z Formula to Test the Difference Testing the Difference in Population
in Population Proportions Proportions
 pˆ  pˆ   p  p 
Z  1

2 1

2
pp
Ho :
1 2
0
Rejection
 p  q  1  1 
 H :pp 0
Region
 n 1 n
Rejection
a
2  1 2 Region

P  x 1
 x 2
 .01 2
.005
  .005 
n  n
1 2 2 2 2
.005
z.005  2.575 Non Rejection Region

n pˆ  n pˆ
1 1 2 2
Z  2575
.
Z  2575
.
n  n
c 0
If z < - 2.575 or z > 2.575, reject H o. c
1 2 Critical Values
If - 2.575  z  2.575, do not reject H o.
q  1  p
Page 12
Testing the Difference in Population Sampling Distribution of Differences
Proportions in Sample Proportions
n  100 n  95  pˆ  pˆ   p  p  For large samples

1 2
z 1 2 1 2 1. n  pˆ  5,
x  24 x  39
1
 
1
1 2
 p  q  1  1  2. n  qˆ  5,
 n1 n 2 
1
24 39 1
pˆ  100  .24 pˆ   .41 3. n  pˆ  5, and

1 2 95

.24  .41   0  2 2
4. n  qˆ  5 where qˆ = 1 - pˆ
P  x1  x 2
.323 .677  1

1 

2 2
n1  n 2  100 95  the difference in sample proportion s is normally distribute d with

24  39

 . 17  pˆ 1
 pˆ 2
 p 1
 p 2
and
100  95 . 067
 . 323   2 . 54 p q p q
σ pˆ pˆ 2
 1 1
 2 2
Since - 2.575  z = - 2.54  2.575, do not reject Ho.

1

n 1 n 2
Confidence Interval to Estimate p1 ‐ p2 Example Problem:
n  400
1 n  480 2
x  48
1 x  187 2
48
pˆ qˆ  pˆ qˆ       z pˆ qˆ  pˆ qˆ pˆ  400  .12 pˆ  187
 pˆ  pˆ  z
 .39
p p pˆ pˆ
1
1 1 2 2 1 1 2 2 480 2
ˆq  1  pˆ  .88 qˆ  1  pˆ  .61
1 2
n n 1 2 n n 1 2 1 2
1 2
1 1 2 2 For a 98% level of confidence
confidence, z = 2.33.
2 33
 pˆ  pˆ  Z pˆ qˆ  pˆ qˆ  pˆ  pˆ  Z pˆ qˆ  pˆ qˆ
1 1 2 2
 pp  1 1 2 2
1 2
n n 1 2
1 2 1 2
n 1 n 2
.12  .39  2.33 .12.88  .39.61  p1  p2  .12  .39  2.33 .12.88  .39.61
400 480 400 480
 .27  .064  pp
1 2
 .27  .064
 .334  pp
1 2
 .206
Page 13
F Test for Two Population Variances Sheet Metal Example
s12
F  Suppose a machine produces metal sheets that are specified to be 22
s 22 millimeters thick. Because of the machine, the operator, the raw
material, the manufacturing environment, and other factors, there is
df num erator  v1  n1  1 variability in the thickness. Two machines produce these sheets.
Operators are concerned about the consistency of the two machines. To
d
e
n
o
m
i
n
a
t
o
r
df  v2  n2  1 test consistency, they randomly sample 10 sheets produced by machine
1 and 12 sheets produced by machine 2. The thickness measurements of
sheets from each machine are given in the table on the following page.
F distribution Assume sheet thickness is normally distributed in the population.
How can we test to determine whether the variance from each sample
comes from the same population variance (population variances are
equal) or from different population variances (population variances are
not equal)?
Sheet Metal Example: Hypothesis Test for Sheet Metal Example
Equality of Two Population Variances
  0.05 F.025,9,11  3.59
Ho :  12   22 Machine 1 Machine 2
n1  10 1 22.3 21.8 22.2 22.0 22.2 22.0
Ha :   2
1
2
2 n2  12
F . 05 , 9 , 11 = 21.8 21.9 21.6 22.1 22.0 22.1
F . 05 , 9 , 11 22.3 22.4 21.8 21.7 21.9
s12 
1 21.6 22.5 21.9 21.9 22.1
F 
s 22 3 . 59
 0 . 28
n1  10 s
2
0.1138 n 2  12
df numerator   1  n1  1
If F < 0.28 or F > 3.59, reject Ho. F 1
  5.63
s  0 . 1138 s s 22  0 . 0202
2 2
0.0202
If 0.28  F  3.59, do reject Ho.
1 2
df deno min ator   2  n2  1
Since F = 5.63 > Fc = 3.59, reject H o.
Page 14

Lecture 5 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 5 PDF

Uploaded by

Copyright:

Available Formats

Learning Objectives Learning Objectives

 =0.05, /2 = 0.025, z0.025 = 1.96

93.261 77.242 62.483  1

74.195 64.276 35.394 n 2

If z   3.08, do not reject Ho. Since z = - 10.42 < - 3.08, reject H o .

Hypothesis test ‐ compares the means of two If σ is unknown, it can be estimated by pooling the

Hatching method A Hatching method B Ho: 1  2  0

D = mean population difference

If - 3.355  t  3.355,do not reject Ho.

Since - 3.355  t = - 0 .70  3.355 , do not reject H o Hypothesized Mean Difference 0

p  proportion from population 1

p  proportion from population 2

z.005  2.575 Non Rejection Region

n  100 n  95  pˆ  pˆ   p  p  For large samples

pˆ  100  .24 pˆ   .41 3. n  pˆ  5, and

n1  n 2  100 95  the difference in sample proportion s is normally distribute d with

Since - 2.575  z = - 2.54  2.575, do not reject Ho.

Since F = 5.63 > Fc = 3.59, reject H o.

You might also like