This action might not be possible to undo. Are you sure you want to continue?

**ST 361: Introduction to Statistics
**

Hypothesis tests and Confidence

Intervals for two means

Kimberly Weems

ksweems@ncsu.edu

Statistics

Outline : Hypothesis Tests for 2 Means

• HT for comparing means of 2 independent

populations/groups

– Hypothesis Testing via CIs

• HT for comparing means of 2 matched/paired

(dependent) populations/groups

– Hypothesis Testing via CIs

Statistics 3

Recall: The Basic Paradigm.

•Population •Sample

•Statistics

•Inference

•Parameters

Statistics

Now, Compare two groups

• Group 1 • Group 2

Statistics 5

Inference for differences

•Population 1

•Sample 1

•Statistics

•Inference

•Parameters

•Population 2

•Sample 2

•Statistics

•Inference

•Parameters

•Inference

•Difference in parameters

•Difference in statistics

Statistics

Hypothetical situation

• Population A

– Mean 300, standard deviation 100

• Population B

– Mean 100, standard deviation 30

• Sample from both populations

– n=30

• Calculate the mean of both samples

• Take the difference in the means

Statistics

Hypothetical situation

• Sample A mean=323.8

• Sample B mean = 98.1

• Difference = 225.6

• Repeat this process 10000 times

Statistics

Means of sample A

Statistics

Means of sample A

• Normal distribution shape

• Centered at 300

• Spread from about 240 to 360

100

18.26

30

y

n

o

o = = =

Statistics

Means of sample B

Statistics

Means of sample B

• Normal distribution shape

• Centered at 100

• Spread from about 80 to 120

30

5.5

30

y

n

o

o = = =

Statistics

Differences

Statistics

Differences

• Normal distribution shape

• Centered at 200

• Spread from about 135 to 265

Statistics

Fact

• The difference in two independent normally

distributed variables will be normal.

– Must know variables are independent

Statistics 15

Fact

• For independent random variables y

1

and y

2

the variance of the difference is the sum of the

variances

• Var(y

1

- y

2

)=Var(y

1

)+ Var(y

2

)

Statistics 16

Some basic principles

• For independent random variables y

1

and y

2

the variance of the difference is the sum of the

variances

• Var(y

1

- y

2

)=Var(y

1

)+ Var(y

2

)

•Note difference still gives sum!!

Statistics 17

Recall

1

2

1

1

2

2

y

y

n

n

o

o

o

o

=

=

Statistics 18

Recall

1 1

2 2

2

2

1 1

1 1

2

2

2 2

2 2

y y

y y

n

n

n

n

o o

o o

o o

o o

= => =

= => =

•Variance of sample mean

Statistics 19

Important formula

1 2

2 2

1 2

1 2

y y

n n

o o

o

÷

= +

Statistics 20

Important formula

1 2

2 2

1 2

1 2

y y

n n

o o

o

÷

= +

•Standard

error of

difference

in sample

means

Statistics 21

Important formula

1 2

2 2

1 2

1 2

y y

n n

o o

o

÷

= +

•Standard

error of

difference

in sample

means

•Variance

of sample

one mean

•Variance

of sample

two mean

Statistics

Note

• In most cases we will not know σ

1

or σ

2

,

instead we will substitute s

1

and s

2

(the sample

SD’s). This will not make a difference if the

samples are large.

Statistics

Test for difference in means: Two-sample t

test for independent samples

• Assumptions

– Samples are random

– Both populations are normally distributed.

– The samples are independent.

Statistics

Test for difference in means: Two-sample t

test for independent samples

• Assumptions

– Samples are random

– Both populations are normally distributed.

– The samples are independent.

•Needed to know distribution of

difference

Statistics

Hypotheses

Null Hypothesis

H

0

: µ

1

-µ

2

= µ

0

Alternative hypothesis

H

1

: µ

1

-µ

2

> µ

0

H

1

: µ

1

-µ

2

< µ

0

H

1

: µ

1

-µ

2

= µ

0

Where µ

0

is some specific value (i.e., the null value)

which is usually 0.

Statistics

Test Statistic

0

statistic-null value

standard error

t =

Statistics

Test Statistic

( )

1 2 0

0

2 2

1 2

1 2

y y

t

s s

n n

µ ÷ ÷

=

+

Statistics

Test Statistic

( )

1 2 0

0

2 2

1 2

1 2

y y

t

s s

n n

µ ÷ ÷

=

+

•From null

hypothesis

•Sample

Means

Statistics

p-value

• t-distribution

– Found from t-table

• Degrees of freedom

– We will use (n

1

+n

2

-2)

– approximately correct if the sample size is large

– approximately correct if n

1

=n

2

and s

1

=s

2

– Use software in other situations to find exact df.

• Found in direction of alternative hypothesis

Statistics

Conclusion

• If p-value is less than o reject H

0

.

Statistics

Example

• A study reported in the Journal of Adolescent

Health examined gender differences in the

amount of time adolescents spent using

computers each day. The study randomly

selected 2110 students from schools in Hong

Kong.

Statistics

Example

• The resulting summary statistics for the total

amount of time students use a computer (in

minutes) each day are given below. Does this

information indicate that gender makes a

difference in computer usage?

Male Female

Mean = 141.15 Mean =133.28

StDev = 97.06 StDev =94.5

n=1009 n=1101

Statistics

Example

• Assumptions

– Samples are random

– Populations are normally distributed.

– The samples are independent.

Statistics

Example

• Assumptions

– Samples are random

– Populations are normally distributed.

– The samples are independent.

• H

0

: µ

1

-µ

2

= 0 (males and females are the

same in terms of computer usage)

• H

1

: µ

1

-µ

2

= 0 (males and females differ

in terms of computer usage).

Statistics

Example

( ) ( )

1 2 0

2 2 2 2

1 2

1 2

y - y -μ 141.15-133.28 - 0

t = =

s s 97.06 94.5

+ +

1009 1101 n n

7.87

=1.88

17.45

Statistics

Example

• P-value

– See t-table

– Degrees of freedom

– df = n

1

+n

2

-2=2108

– use last row on the table.

Statistics

t Table

•1.88 is between 1.645 and 1.96

Statistics

t Table

Statistics

Example

• P-value

– t Table

– Degrees of freedom

– df = n

1

+n

2

-2=2108

– use last row on the table.

• 1.645<1.88<1.96

• 2(0.05)>p-value>2(0.025)

– Two sided test: double the probabilities

– 0.10>p-value>0.05

Statistics

Example

• If p-value <= o reject H

0

.

• p-value>0.05 Do not reject H0

• Not enough evidence to conclude that there is a

difference in computer usage between males

and females

Statistics

Confidence Interval

• Statistic ± Margin of Error

( )

2 2

1 2

1 2

1 2

s s

y y t

n n

÷ ± +

Statistics

Confidence Interval

• Statistic ± Margin of Error

( )

2 2

1 2

1 2

1 2

s s

y y t

n n

÷ ± +

•Df =n

1

+n

2

-2

Statistics

Example

• We would like to find a 95% confidence

interval for the mean difference between male

and female computer usage in this population .

Male Female

Mean = 141.15 Mean =133.28

StDev = 97.06 StDev =94.5

n=1009 n=1101

Statistics

Example

( )

( )

2 2

1 2

1 2

1 2

2 2

s s

y - y ±t +

n n

97.06 94.5

141.15-133.28 ±1.96 +

1009 1101

7.87±1.96 17.45 => 7.87±8.19

(-0.32,16.06)

Statistics

Example

• Notice that the interval (-.32, 16.06) contains

the null value 0. Therefore, it is plausible

(with 95% confidence) that the true difference

in mean computer usage is 0. Thus, we fail to

reject H

0

.

• We are 95% confident that the interval (-.32,

16.06) contains the true difference in computer

usage between males and females.

Statistics

General Rule: CI approach to a 2-tailed HT

• If the null value is contained in the 100(1-o)%

CI, then we FAIL TO REJECT H

0

at level o.

• If the null value is NOT contained in the

100(1-o)% CI, then we REJECT H

0

at level o.

• Can only use this approach for a 2-tailed test

Statistics

Special Case: Pooled t-test

• In this section we take up a special case of the

two sample t-test.

– Used when we can make a specific assumption

– Degrees of freedom will be exactly n

1

+n

2

-2

• Assume SD’s are equal: o

1

= o

2

– Pool the information you have about them.

Statistics

Pooled Variances

• Combine information about both variances into

a single estimate. Use this estimate in the

standard error formula.

( ) ( )

2 2

1 1 2 2 2

1 2

1 1

2

n s n s

s

n n

÷ + ÷

=

+ ÷

Statistics

Test for difference in means

(pooled variance)

• Assumptions

– Samples are random

– Populations are normally distributed.

– Population variances are equal.

– Samples are independent.

Statistics

Hypotheses

Null Hypothesis

H

0

: µ

1

-µ

2

= µ

0

Alternative hypothesis

H

1

: µ

1

-µ

2

> µ

0

H

1

: µ

1

-µ

2

< µ

0

H

1

: µ

1

-µ

2

= µ

0

Where µ

0

is some specific (null) value.

Statistics

Test Statistic

( )

1 2 0

0

2 2

1 2

y y

t

s s

n n

µ ÷ ÷

=

+

Statistics

Test Statistic

( )

1 2 0

0

2 2

1 2

y y

t

s s

n n

µ ÷ ÷

=

+

•Pooled Variance

Statistics

p-value

• Found from t-table with n

1

+n

2

– 2 degrees of

freedom

• Found in direction of alternative hypothesis

– For two sided (=) alternative find one sided case

and double results.

Statistics

Conclusion

• If p-value <= o reject H

0

.

Statistics

Note: Can also use rejection region

• For both the “separate” and “pooled” variance

two-sample t-test, we can also use a rejection

region approach. Recall:

• Select a significance level o.

• Determine the Rejection Region: set of values

for which one rejects H

0

.

• Compute the sample mean and the test

statistic. Reject H

0

if the test statistic lies in

the rejection region.

Statistics

Note: Can also use rejection region

• Alternative Hypothesis & Rejection Region

H

1

Rejection Region

H

1

: µ

1

-µ

2

= µ

0

|t

0

| > t

o/2,n1+n2 ÷ 2

H

1

: µ

1

-µ

2

> µ

0

t

0

> t

o, n1+n2 ÷ 2

H

1

: µ

1

-µ

2

< µ

0

t

0

< - t

o, n1+n2 ÷ 2

Statistics

Example

• Does the color of paper make a difference in

exam scores? A history professor created two

versions of an exam. The two versions were

printed on two colors of paper. He

administered them to his class by randomly

assigning them to his students.

Statistics

Example

• Twenty-one students took the exam version

that was on pink paper. Eighteen students

were assigned to the version on gold paper.

The resulting scores are summarized below.

Does this indicate that there is a significant

difference between the two colors of the exam?

Color n Mean St. Dev

Pink 21 72 8.1

Gold 18 64 9.2

Statistics

Example

• Assumptions

– Samples are random

– Populations are normally distributed

– Populations have same variance

– The samples are independent.

• H

0

: µ

1

-µ

2

= 0 (version of the exam does

not make a difference)

• H

1

: µ

1

-µ

2

= 0 (versions do make a

difference).

Statistics

Example

( ) ( )

( ) ( )

( ) ( )

2 2

1 1 2 2 2

1 2

2 2

n -1 s + n -1 s

s =

n +n -2

21-1 8.1 + 18-1 9.2

=

21+18-2

21-1 65.61+ 18-1 84.64

=

21+18-2

1312.2+1438.88 2751.08

= =74.35

37 37

Statistics

Example

( ) ( )

2 2

1 2 0

1 2

y - y -μ 72-64 - 0

t = =

74.35 74.35

s s

+

+

21 18

n n

8

=2.88

7.67

Statistics

P-value

• Degrees of freedom

• n

1

+n

2

– 2=21+18-2=37

• Closest df=38 (can round up or down)

• 2*(0.005)>p-value

• 0.010>p-value

Statistics

Conclusion.

• P-value is less than 0.05=> Reject H

0

• There is evidence of a significant difference

between the 2 versions of the exam.

Statistics

Confidence interval for difference.

( )

2 2

1 2

1 2

s s

y y t

n n

÷ ± +

Statistics

Example

• Calculate a 95% confidence interval for the

difference in mean score of exams for the two

versions.

– 37 degrees of freedom

Statistics

Example

( )

2 2

1 2

1 2

s s

y - y ±t +

n n

74.35 74.35

8±2.024 +

21 18

8±2.024 7.67 => 8±5.605

(2.4,13.6)

Statistics

Example

• Notice that the null value “0” is not contained

in the interval (2.4, 13.6), so we reject the null

hypothesis.

• We are 95% confident that the interval (2.4,

13.6) contains the true mean difference

between the exam versions. OR

• The observed interval (2.4, 13.6) brackets the

true difference in mean exam scores, with 95%

confidence.

Statistics

When do we do this test?

• For any sample size (especially when n is

small) and

• If we can make the assumption of equal

variances

– Often ok if we have two randomly assigned groups

• When using software distinction not as

important.

– Was more important before computing

– Probably see this test in literature.

Statistics

Paired Differences

• Compare two measures on the same subject

– Right and left hand

– Pre-test and post-test

– Before and after measure

• Record two measures on same subject

– Take the difference in those measures

– Change scores

Statistics

Example

• We recorded the right and left hand strength of 9

randomly selected college age adults.

Statistics

Example

Subject Dominant Off Dom

1 333 350

2 380 374

3 164 189

4 330 308

5 214 209

6 282 224

7 390 382

8 258 293

9 221 219

Statistics

Example

Subject Dominant Off Dom Difference

1 333 350 -17

2 380 374 6

3 164 189 -25

4 330 308 22

5 214 209 5

6 282 224 58

7 390 382 8

8 258 293 -35

9 221 219 2

Statistics

Example

• Treat differences as a single sample

• Hypotheses:

– If there is no difference average should be 0

– If dominant hand is stronger difference should be

greater than 0

Statistics

Notation

D

sample average of differences

s standard deviation of differences

n number of differences

D

y ÷

÷

÷

Statistics

Example

D

D

y = ______

s =27.50

n = _______

Statistics

Test for paired differences

• Assumption

– We have a random sample of the differences

– The population of differences is normally

distributed.

Statistics

Hypotheses

H

0

: µ = µ

0

H

1

: µ > µ

0

H

1

: µ < µ

0

H

1

: µ = µ

0

Where µ is really µ

D

, the true mean of the

differences

Statistics

Test Statistic

0

0

D

D

y

t

s

n

µ ÷

=

Statistics

Test Statistic

0

0

D

D

y

t

s

n

µ ÷

=

From null

hypothesis

(usually zero)

Mean of sample

SD of sample

Statistics

Test Statistic: Example

0

0

?

D

D

y

t

s

n

µ ÷

= =

Statistics

p-value

• Found from t-table using n-1 degrees of

freedom.

Statistics

Conclusion

• If p-value is less than o reject H

0

.

• Or, we can use a rejection region approach

with n-1 df.

• Or, for a 2-tailed (i.e. 2-sided) test, we can use

a CI approach

• Conclusion for the example:

/ 2, 1

. . . ( / )

D n D

y mo e y t s n

o ÷

± = ±

Statistics

Rejection Region Approach

• Alternative Hypothesis & Rejection Region

H

1

Rejection Region

H

1

: µ ≠ µ

0

|t

0

| > t

o/2,n÷1

H

1

: : µ > µ

0

t

0

> t

o, n÷1

H

1

: : µ < µ

0

t

0

< - t

o, n÷1

Statistics

CI for the mean µ of a normal distribution, when σ is

unknown (cont’d)

• When σ is unknown, the (1-α)% CI for µ for a particular

sample (x

1

,…,x

n

) is:

where is the sample mean, s is the sample standard

deviation, n is the sample size, is the critical value of a

t distribution with df=n-1 corresponding to a right tail

probability of α/2.

/ 2, 1

. . . /

n

x mo e x t x n

o ÷

± = ±

x

/ 2, 1 n

t

o ÷

Statistics

Example: Twin Weights

• http://www.statcrunch.com/5.0/index.php?dataid=338704

• Weights for 19 newborn twins born to members of the Greater

Columbia South Carolina Area Mothers of Twins Club from

September 2000 to December 2001.

ST 361 Notes

ST 361 Notes

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd