Lecture 6 Compatibility Mode PDF

Learning
Objectives
Test hypotheses and construct confidence intervals
about the difference in two population means using
the Z statistic.
the t statistic.
Learning Objectives
about the difference in two related populations
about the difference in two related populations.
about the differences in two population proportions.
about the differences in two population proportions.
about two population variances using the F statistic.
Hypothesis Testing; Confidence Intervals ‐
Difference in Means using z Statistic
Difference in Means using z Statistic
(Population Variances Known)
Calculating two sample means and using the
difference in the two sample means is used to
test the difference in the population
test the difference in the population
The central limit theorem states that the difference
in two sample means is normally distributed for large
p y g
sample sizes ((both n1 and n2) > 30) regardless of the
shape of the population
Hypothesis Testing for Differences
Between Means: The Growth Example
Between Means: The Growth Example
As a specific example, suppose we want to conduct a
hypothesis test to determine whether the average annual
hypothesis test to determine whether the average annual
growth for an animal species is different from the average
annual growth μ1 of another species μ2. Because we are testing
to determine whether the means are different, it might seem
logical that the null and alternative hypotheses would be
Ho: μ1 = μ2
Ha: μ1 ≠ μ2
Between Means:
Between Means:
H0 : 1  2
H a : 1  2
 =0.05, /2 = 0.025, z0.025 = 1.96
The hypotheses can also be expressed as:
H0 : 1  2  0
H a : 1  2  0
Analysis is testing whether there is a difference in the
Analysis is testing whether there is a difference in the
annual growth. This is a two tailed test.
Between Means
Between Means
If z < - 1.96 1 96 reject Ho.
1 96 or z > 1.96,
Rejection
Rejection Region If - 1.96  z  1.96, do not reject Ho.
Region
 
 . 025  .025
2 2
Non Rejection Region
Z c
 1.96
0 Z c
 1.96
Critical Values
Hypothesis Testing for Differences Between
Means:
Species 1 n1
 32 Species 2
74.256 57.791 71.115
x 1
 70.700 69.962 77.136 43.649
 1  16.253
96.234 65.145 67.574 55.052 66.035 63.369
89.807 96.767 59.621 57.828 54.335 59.676
  264 .164
2
93.261 77.242 62.483 1
63.362 42.494 54.449
103.030 67.056 69.319 37.194 83.849 46.394
74.195 64.276 35.394 n 2

 34 99.198
61.254
67.160
37.386
71.804
72.401
75.932 74.194 86.741
x 2
 62 .187
73.065 59.505 56.470

80.742 65.360 57.351
2
 12 .900 48.036 72.790 67.814
39.672 73.904
  166 .411
2 60 053
60.053 71 351
71.351 71 492
71.492
45.652 54.270 2
66.359 58.653
93.083 59.045
61.261 63.508
63.384 68.508
Between Means:
Between Means:
z  (70.700  62.187)  (0)  2.35

264.160  166.411
32 34
Since the observed value of 2.35 is greater than 1.96,
reject the null hypothesis. That is, there is a significant
difference between the average annual growth of species 1
g g p
and the average annual growth of species 2.
Between Means
Between Means
 0
Ho :
1 2
Rejection
Region
H :    0
Rejection
a Region
1 2
 
 .025  .025
2 2
X 1
 X2 X X
1 2
Critical Values
Between Means
Between Means
If z  1.96 o
or z  1.96, reject
eject H 0 .
Rejection
Rejection Region If  1.96  z  1.96, do not reject H 0 .
Region
( x1  x2 )  ( 1  2)
z
  12  22
  .025 
2
 . 025 2 n1 n2
(70.700- 62.187) - (0)
  2.35
Z  2.33 Z  2.33 264.164 166.411
0

c c
Critical Values 32 34
Since z  2.35  1.96, reject H 0 .
Demonstration Problem
A sample of 87 men showed that the average calcium
depletion per year is 3352 µg The population standard
depletion per year is 3352 µg. The population standard
deviation is 1100 µg. A sample of 76 women showed that the
average calcium depletion per year is 5727 µg, with a
population standard deviation of 1700 µg. A researcher wants
to “prove” that women lose more calcium. If they use α = .001
and these sample data, will they be able to reject a null
p , y j
hypothesis that women annually lose as much (or less) calcium
as men do?
Rejection
Ho :  1   2  0
Region
Ha :  1   2  0  .001
001
Z c
 3
3.08 0
Critical Value
men Women
x1  $3,352 x2  $5,727
 1  $1,100  2  $1,700
Rejection
Region n1  87 n2  76
 .001 x  x      
z
1 2 1 2
 
2 2
1 2
Z  308
c
.
0 n n 1 2
Critical Value

3352  5727  0  10.42
2 2
1100  1700
If z < - 3.08, reject Ho. 87 76
If z   3.08, do not reject Ho. Since z = - 10.42 3 08 reject H o .

10 42 < - 3.08,
The evidence is substantial that women, on
h d b l h
average, lose more calcium than men.
Confidence Interval
Sometimes the solution(s) is/are to take a random
sample from each of the two populations and study
sample from each of the two populations and study
the difference in the two samples.
Formula for confidence interval to estimate (µ1 ‐ µ2).
Formula for confidence interval to estimate (µ
Designating a group as group one, and another as
group two is an arbitrary decision.
men Women
x1  $3,352 x2  $5,727
 1  $1,100  2  $1,700 95 % Confidence  z = 1.96
n1  87 n2  76
x  x   z     
2 2 2 2
1 2
 1   2   x1  x 2   z 1 2
n n n n
1 2
1 2 1 2
Calculate it!
Hypothesis Testing
Hypothesis test ‐ compares the means of two
samples to see if there is a difference in the two
samples to see if there is a difference in the two
population means from which the sample comes.
This is used when σ2 is unknown and samples are
p
independent.
Assumes that the measurement is normally distributed.
Hypothesis Testing
If σ is unknown, it can be estimated by pooling the
two sample variances and computing a pooled sample
two sample variances and computing a pooled sample
standard deviation
t Test for Differences in Population Means
Each of the two populations is normally distributed.
Th
The two samples are independent.
l i d d
The values of the population variances are unknown.
Th
The variances of the two populations are equal.
i f th t l ti l
12 = 22
t Formula to Test the Difference in
Means Assuming 12 =
Means Assuming =  22
( x1  x 2 )  (  1   2 )
t
s12 ( n1  1)  s 22 ( n 2  1) 1 1

n1  n 2  2 n1 n 2
Shrimp weights
Hatching method A
Hatching method A Hatching method B
Hatching method B
56 50 52 44 52 59 54 55 65
47 47 53 45 48 52 57 64 53
42 51 42 43 44 53 56 53 57
Ho: 1  2  0
Ha: 1  2  0 Rejection
R i
Region
Rejection
Region

 .05 .025
  .025 2
2 2 
df  n1  n2  2  15  12  2  25 .025
2
t 0.25 , 25  2.060 Non Rejection Region
t
.025,25
 2060
.
t  2060
.
0 .025,25
If t < - 2.060 or t > 2.060, reject Ho.
Critical Values
If - 2.060  t  2.060, do not reject Ho.
Shrimp hatching methods
Hatching Method A Hatching Method B
56 51 45
59 57 53
47 52 43
52 56 65
42 53 52
53 55 53
50 42 48
54 64 57
47 44 44
n1  15 n2  12
x1  47.73 x2  56.5
s  199.495
2
1 95 s  18.273
2
2
( x1  x 2 )  (  1   2 ) ( x1  x 2 )  (  1   2 )
t  
s 12 s 22 s 12 ( n 1  1)  s 22 ( n 2  1) 1 1
 
n1 n2 n1  n 2  2 n1 n2

 4 7 .7 3  5 6 .5 0   0
1 9 .4 9 5 1 4   1 8 .2 7 3 1 1  1

1
15  12  2 15 12
  5 .2 0
2
 s 12 s 22 
 n 
 n 2 
d f  1
2 2
 2 5
 s 12   s 22 
   
 n1    n 2 
n1  1 n 2  1
If t < - 2.060 or t > 2.060, reject H o.

Since t = -5.20 < -2.060, reject H o .
If - 2 060  t  2.060,
2.060 2 060 do j t H o.
d nott reject
The conclusion is that there is a significant
difference in the effectiveness of the hatching
methods.
Confidence Interval to Estimate 1 ‐ 2
when 12 and 22 are unknown and
when are unknown and 12 = =  22
2
s (n1  1)  s (n2  1) 1 1
2
( x1  x2 )  t
1 2

n1  n2  2 n1 n2
where df  n1  n2  2
A coffee manufacturer is interested in estimating the difference in
the average daily coffee consumption of regular coffee drinkers and
the average daily coffee consumption of regular‐coffee drinkers and
decaffeinated‐coffee drinkers. Its researcher randomly selects 13
regular‐coffee drinkers and asks how many cups of coffee per day
th d i k H
they drink. He randomly locates 15 decaffeinated‐coffee drinkers and
d l l t 15 d ff i t d ff d i k d
asks how many cups of coffee per day they drink. The average for the
regular‐coffee drinkers is 4.35 cups, with a standard deviation of 1.20
cups. The average for the decaffeinated‐coffee drinkers is 6.84 cups,
Th f h d ff i d ff d i k i 6 84
with a standard deviation of 1.42 cups. The researcher assumes, for
each population, that the daily consumption is normally distributed,
and he constructs a 95% confidence interval to estimate the
difference in the averages of the two populations.
n1  13, n2  15
x1  4.35, x2  6.84
s1  1.20, s2  1.42
  0.05, t0.025, 26  2.056
(1.20) 2 (12)  (1.42) 2 (14) 1 1

(4.35  6.84)  2.056 
13  15  2 13 15
 2.49  1.03
 3.52  1 -  2  1.46
The researcher is 95% confident that the difference in population
average daily consumption of cups of coffee between regular‐
d il ti f f ff b t l and d
decaffeinated‐coffee drinkers is between 1.46 cups and 3.52 cups.
Statistical Inferences for Two
Related Populations
Related Populations
Dependent samples
Used in before and after studies
After measurement is not independent of the before
measurement
Hypothesis Testing
Researcher must determine if the two samples are
related to each other
related to each other
The technique for related samples is different from the
technique used to analyze independent samples
Matched pairs test requires the two samples be the
same size
Dependent Samples
Before and after measurements on the same
individual
Individual Before After
Studies of twins
Studies of spouses
Studies of spouses 1 32 39
2 11 15
3 21 35
4 17 13
5 30 41
6 38 39
7 14 22
Hypothesis Testing
The following t test for dependent measures uses the
sample difference d between individual matched
sample difference, d, between individual matched
samples as the basic measurement of analysis
An analysis of d
y converts the problem from a two sample
p p
problem to a single sample of differences
Formulas for Dependent Samples
d D  d
t d 
sd n
n  (d  d )2
sd 
n 1
dff  n  1
( d )2
n  number of pairs  d  2
 n
d = sample difference in pairs n 1
D = mean population difference

st = standard deviation of sample difference
d = mean sample difference
Hypothesis Testing
Analysis of data by this method involves calculating a
t value with a critical value obtained from the table
value with a critical value obtained from the table
n in the degrees of freedom (n – 1) is the number of
matched pairs of scores
W/H Ratios for Nine Randomly
Selected Ethnic Groups
Selected Ethnic Groups
Suppose a stock market investor is interested in
determining whether there is a significant difference
determining whether there is a significant difference
in the W/H (weight to height) ratio for 2 year old
children of different ethnic groups in Vietnam. In an
children of different ethnic groups in Vietnam. In an
effort to study this question, the investor randomly
samples nine ethnic groups from Vietnam and
records the W/H ratios for each of these groups at
the end of year 1 and at the end of year 2.
W/H Ratios for Nine Randomly
Selected Groups
Selected Groups
Year 1 Year 2
Groups W/H Ratio W/HRatio
1 8.9 12.7
2 38.1 45.4
3 43.0 10.0
4 34.0 27.2
5 34.5 22.8
6 15.2 24.1
7 20 3
20.3 32 3
32.3
8 19.9 40.1
9 61.9 106.5
Hypothesis Testing with Dependent
Samples: W/H Ratios for Nine groups
Ho : D  0
Ha : D  0 Rejection
Region
Rejection
Region
  .01
 
df  n 1  9 1  8 .005 2
.005
2
t.005,8  3.355 Non Rejection Region
If t < - 3.355or t > 3.355,reject Ho. t

.01,11
 3.355
3 0 t
.01,11
 3.355
3
If - 3.355  t  3.355,do not reject Ho. Critical Value

H0: D = 0 d   5 . 033
H1: D  0 s d  21 . 599
 5 . 033  0
t   0 . 70
21 . 599
9
Since -3.355  t = - 0 .70  3.355 , do not reject H o

t-Test: Paired Two Sample for Means
Year 1 Year 2
W/H Ratio W/H Ratio
Mean 30.64 35.68
Variance 268.1 837.5
Observations 9 9
Pearson Correlation 0.674
Hypothesized Mean Difference 0
df 8
t Stat -0.7
P(T<=t) one-tail 0.252
t Critical one-tail 1.86
P(T<=t) two-tail 0.504
t Critical two-tail 2.306
Confidence Intervals
Researcher can be interested in estimating the mean
difference in two populations for related samples
difference in two populations for related samples
This requires a confidence interval of D (the mean
population difference of two related samples) to be
constructed
Confidence Intervals for Mean
Difference for Related Samples
Difference for Related Samples
d t s d
 D  d t s d
n n
df  n  1
Difference in Number Bacteria Colonies
strain Without treatment With treatment d
d  3.39
1 8 11 ‐3
2 19 30 ‐11
3 5 6 ‐1
4 9 13 ‐4
sd  3.27 5
6
7
3
0
13
5
4
15
‐2
‐4
‐2
2
8 11 17 ‐6
9 9 12 ‐3
10 5 12 ‐7
11 8 6 2
12 2 5 ‐3
13 11 10 1
14 14 22 ‐8
15 7 8 ‐1
16 12 15 ‐3
17 6 12 ‐6
6
18 10 10 0
Confidence Interval for Mean Difference
in Number of bacteria colonies
in Number of bacteria colonies
df  n  1  18  1  17
t . 005 , 17  2 . 898
d  t s d
 D  d  t s d
n n
3 . 27 3 . 27
 3 . 39  2 . 898  D   3 . 39  2 . 898
18 18
 3 . 39  2 . 23  D   3 . 39  2 . 23
 5 . 62  D   1 . 16
The analyst estimates with a 99% level of confidence that the
The analyst estimates with a 99% level of confidence that the
average difference in the number of bacteria colonies with
and without treatment is between ‐5.62 and ‐1.16 houses.
Statistical Inference about two
Population Proportions ( pp̂ – pp̂ )
Population Proportions ( – 1 2
pˆ  pˆ
Sample proportion used is ( ) 1 2
( pˆ1  pˆ 2 )  ( p1  p2 ) pˆ  proportion from sample 1

1
z pˆ  proportion from sample 2

p1  q1 p2  q2 2
 n  size of sample 1
1
n1 n2 n  size of sample 2
2
p  proportion from population 1

1
p  proportion from population 2

2
q  1- p
1 1
q  1- p
2 2
Hypothesis Testing
Because population proportions are unknown,
an estimate of the Std Dev of the difference in two
an estimate of the Std Dev of the difference in two
sample proportions is made by using sample
proportions as point of estimates of the population
proportion
Z Formula to Test the Difference
in Population Proportions
in Population Proportions
Z 
 pˆ 1
 pˆ 2
  p 1
 p 2

 1 
 p  q  
1 

 n 1 n 2 
P  x 1
 x 2
n  n
1 2

n pˆ  n pˆ
1 1 2 2
n  n 1 2
q  1  p
Testing the Difference in Population
Proportions
pp
Ho :
1 2
0
Rejection
H :pp
Region
a 0 Rejection
Region
1 2

.005
 .01 2
  .005 
 005
.
2 2 2
z.005  2.575 Non Rejection Region
If z < - 2.575
2 575 or z > 2.575,
2 575 reject
j H o.
Z  2575
c
. 0 Z  2575
c
.
Critical Values
If - 2.575  z  2.575, do not reject H o.
Testing the Difference in Population
Proportions
n  100
1 n2
 95
z
 ˆ  pˆ   p  p 
p1 2 1 2
x  24
1 x2
 39  1
 p  q   
 1 
24
pˆ 100  .24
 pˆ 
39
 .41  n1 n 2 
1 2 95

.24  . 41   0 
P  x 1
x 2  1
. 323 . 677  
1 

n 1
n 2  100 95 
24  39  . 17
 
100  95 . 067
 . 323   2 . 54
Since - 2.575  z = - 2.54  2.575, do not reject Ho.

Sampling Distribution of Differences
in Sample Proportions
in Sample Proportions
For large samples
1. n  pˆ  5, 1 1
2. n  qˆ  5, 1 1
3. n  pˆ  5, and
2 2
4. n  qˆ  5 where
2
qˆ = 1 - pˆ
2
the difference in sample proportion s is normally distribute d with

 pˆ 1
 pˆ 2
 p 1
 p 2
and
p q p q
σ pˆ 1  pˆ 2
 1 1
 2 2
n 1 n 2
Confidence Interval to Estimate p1 ‐ p2
 pˆ  pˆ  z pˆ qˆ  pˆ qˆ       z pˆ qˆ  pˆ qˆ
1 1 2
p p pˆ pˆ
2 1 1 2 2
1 2
n n1 2
1 2
n n
1 2
1 2
Example Problem:
n  400
1 n  480
2
x  48
1 x  187
2
48
ˆp  400
1
 .12 187
pˆ 480  .39

2
qˆ  1  pˆ  .88
1 1
qˆ 2
 1 pˆ 2
 .61 For a 98% level of confidence, z = 2.33.
 pˆ  pˆ  Z pˆ qˆ pˆ qˆ  pˆ  pˆ  Z pˆ qˆ pˆ qˆ
1
 1 2 2
 pp  1
 1 2 2
1 2
n n
1 2
1 2 1 2
n 1 n 2
.12  .39  2.33 .12.88  .39.61  p1  p2  .12  .39  2.33 .12.88  .39.61
400 480 400 480
 .27  .064  pp 1 2
 .27  .064
 .334  p  p  .206
1 2
F Test for Two Population Variances
s12
F  2
s2
df num erator  v1  n1  1
d
e
n
o
m
i
n
a
t
o
r
df  v2  n2  1
F distribution
di ib i
Sheet Metal Example
Suppose a machine produces metal sheets that are specified to be 22
millimeters thick Because of the machine the operator the raw
millimeters thick. Because of the machine, the operator, the raw
material, the manufacturing environment, and other factors, there is
variability in the thickness. Two machines produce these sheets.
Operators are concerned about the consistency of the two machines To
Operators are concerned about the consistency of the two machines. To
test consistency, they randomly sample 10 sheets produced by machine
1 and 12 sheets produced by machine 2. The thickness measurements of
sheets from each machine are given in the table on the following page
sheets from each machine are given in the table on the following page.
Assume sheet thickness is normally distributed in the population.
How can we test to determine whether the variance from each sample
How can we test to determine whether the variance from each sample
comes from the same population variance (population variances are
equal) or from different population variances (population variances are
not equal)?
not equal)?
Sheet Metal Example: Hypothesis Test for
Equality of Two Population Variances
Equality of Two Population Variances
  0.05 F.025,9,11  3.59
Ho :  12   22
n1  10 1
Ha :   2
1
2
2 n2  12
F . 05 , 9 , 11 =
F . 05 , 9 , 11
s12 
1
F  2
s2 3 . 59
 0 . 28
df numerator   1  n1  1
If F < 0.28
0 28 or F > 3.59
3 59, reject Ho.
df deno min ator   2  n2  1 If 0.28  F  3.59, do reject Ho.
Sheet Metal Example
Machine 1 Machine 2
22.3
22 3 21.8
21 8 22.2
22 2 22.0
22 0 22.2
22 2 22.0
22 0
21.8 21.9 21.6 22.1 22.0 22.1
22.3 22.4 21.8 21.7 21.9
21.6 22.5 21.9 21.9 22.1
n1  10 n 2  12
Fs
2
0.1138
1
  5.63
s12  0 . 1138 s s 22  0 . 0202
2
0.0202
2
Since F = 5.63 > Fc = 3.59, reject Ho.

Lecture 6 Compatibility Mode PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 6 Compatibility Mode PDF

Uploaded by

Copyright:

Available Formats

Learning

103.030 67.056 69.319 37.194 83.849 46.394

74.195 64.276 35.394 n 2

z  (70.700  62.187)  (0)  2.35

If z   3.08, do not reject Ho. Since z = - 10.42 3 08 reject H o .

If t < - 2.060 or t > 2.060, reject H o.

(1.20) 2 (12)  (1.42) 2 (14) 1 1

D = mean population difference

If t < - 3.355or t > 3.355,reject Ho. t

If - 3.355  t  3.355,do not reject Ho. Critical Value

Since -3.355  t = - 0 .70  3.355 , do not reject H o

( pˆ1  pˆ 2 )  ( p1  p2 ) pˆ  proportion from sample 1

z pˆ  proportion from sample 2

p  proportion from population 1

p  proportion from population 2

Since - 2.575  z = - 2.54  2.575, do not reject Ho.

the difference in sample proportion s is normally distribute d with

Since F = 5.63 > Fc = 3.59, reject Ho.

You might also like