Contents

I Introduction, the null and alternative hypotheses

I Hypothesis testing process

I Type I and Type II errors, power

I Test statistic, level of significance and rejection/acceptance regions

in upper-, lower- and two-tail tests

I Test of hypothesis: procedure

I p-value

I Two-tail tests and confidence intervals

I Examples with various parameters

I Power and sample size calculations

Chapter 2. Hypothesis testing in one population

Learning goals

At the end of this chapter you should be able to:

I Perform a test of hypothesis in a one-population setting

I Formulate the null and alternative hypotheses

I Understand Type I and Type II errors, define the significance level,

define the power

I Choose a suitable test statistic and identify the corresponding

rejection region in upper-, lower- and two-tail tests

I Use the p-value to perform a test

I Know the connection between a two-tail test and a confidence

interval

I Calculate the power of a test and identify a sample size needed to

achieve a desired power

Chapter 2. Hypothesis testing in one population

References

I Newbold, P. Statistics for Business and Economics

I Chapter 9 (9.1-9.5)

I Ross, S. Introduction to Statistics

I Chapter 9

Test of hypothesis: introduction

I is based on a data sample

I and allows us to make a decision

I about a validity of some conjecture or hypothesis about the

population X , typically the value of a population parameter ( can

be any of the parameters we covered so far: , p, 2 , etc)

This hypothesis, called a null hypothesis (H0 ):

I Can be thought of as a hypothesis being supported (before the test

is carried out)

I Will be believed unless sufficient contrary sample evidence is

produced

I When sample information is collected, this hypothesis is put in

jeopardy, or tested

The null hypothesis: examples

their contents weigh at least 20 ounces. To check this claim, the contents

of a random sample of boxes are weighed and inference is made.

'

Population: X = weight of a box of cereal (in oz)

0

z}|{

Null hypothesis, H0 : 20 SRS

2. A company receiving a large shipment of parts accepts their delivery only

if no more than 50% of the parts are defective. The decision is based on a

check of a random sample of these parts.

Population: X = 1 if a part is defective and 0 otherwise

'

X Bernoulli(p), p = proportion of defective parts in the entire shipment

p0

z}|{

Null hypothesis, H0 : p 0.5 SRS

Null hypothesis, H0

I States the assumption to be tested

I We begin with the assumption that the null hypothesis is true (similar to

the notion of innocent until proven guilty)

I Refers to the status quo

I Always contains a =, or sign (closed set)

I May or may not be rejected

I Simple hypothesis (specifies a single value):

2

0 p0 0

z}|{ z}|{ z}|{

2

H0 : = 5 , H0 : p = 0.6 , H0 : = 9 In general: H0 : = 0

I Composite hypothesis (specifies a range of values):

0 p0

z}|{ z}|{

H0 : 5 , H0 : p 0.6 In general: H0 : 0 or H0 : 0

Alternative hypothesis, H1

If the null hypothesis is not true, then some alternative must be true, and in

carrying out a hypothesis test, the investigator formulates an alternative

hypothesis against which the null hypothesis is tested.

The alternative hypothesis H1 :

I Is the opposite of the null hypothesis

I Challenges the status quo

I Never contains =, or sign

I May or may not be supported

I Is generally the hypothesis that the researcher is trying to support

I One-sided hypothesis:

(upper-tail) H1 : > 5 (lower-tail) H0 : p < 0.6

In general: H1 : > 0 or H1 : < 0

I Two-sided hypothesis (two-tail):

2

H1 : 6= 9 In general: H1 : 6= 0

The alternative hypothesis: examples

1. A manufacturer who produces boxes of cereal claims that, on average,

their contents weigh at least 20 ounces. To check this claim, the contents

of a random sample of boxes are weighed and inference is made.

Population: X = weight of a box of cereal (in oz)

'

Null hypothesis, H0 : 20 versus

2. A company receiving a large shipment of parts accepts their delivery only

if no more than 50% of the parts are defective. The decision is based on a

check of a random sample of these parts.

Population: X = 1 if a part is defective and 0 otherwise

X Bernoulli(p), p = proportion of defective parts in the entire shipment

'

Null hypothesis, H0 : p 0.5 versus

Hypothesis testing process

Is it likely to observe a

xyyxxxxyy

Population:

sample mean x = 1.65

if the population mean is

1.6?

X = height of a UC3M student (in m)

Claim: On average, students are shorter

than 1.6 ) Hypotheses:

'

H0 : 1.6 versus H1 : > 1.6

hypothesis in favour of the

Sample: Suppose the sample alternative.

mean height is 1.65 m, x = 1.65

Hypothesis testing process

the sample information, a decision concerning the null hypothesis

(reject or fail to reject H0 ) must be made.

I The decision rule is based on the value of a distance between the

sample data we have collected and those values that would have a

nigh probabiilty under the null hypothesis.

I This distance is calculated as the value of a so-called test statistic

(closely related to the pivotal quantities we talked about in Chapter

1). We will discuss specific cases later on.

I However, whatever decision is made, there is some chance of

reaching an erroneous conclusion about the population parameter,

because all that we have available is a sample and thus we cannot

know for sure if the null hypothesis is true or not.

I There are two possible states of nature and thus two errors can be

committed: Type I and Type II errors.

Type I and Type II errors, power

I Type I Error: to reject a true null hypothesis. A Type I error is considered

a serious type of error. The probability of a Type I Error is equal to and

is called the significance level.

Type II Error is .

Actual situation

Decision H0 true H0 false

Do not No error Type II Error

Reject H0 (1 ) ( )

Reject Type I error No Error

H0 () (1 = power)

Type I and Type II errors, power

I Type I and Type II errors can not happen at the same time

I Type I error can only occur if H0 is true

I Type II error can only occur if H0 is false

I If the Type I error probability () *, then the Type II error

probability +

I All else being equal:

I * when the dierence between the hypothesized parameter value

and its true value +

I * when +

I * when *

I * when n +

I The power of the test increases as the sample size increases

I For 2 1

power() = 1

I For 2 0

power()

Test statistic, level of significance and rejection region

Test statistic, T

I Allows us to decide if the sample data is likely or unlikely to

occur, assuming the null hypothesis is true.

I It is the pivotal quantity from Chapter 1 calculated under the null

hypothesis.

I The decision in the test of hypothesis is based on the observed value

of the test statistic, t.

I The idea is that, if the data provide an evidence against the null

hypothesis, the observed test statistic should be extreme, that is,

very unusual. It should be typical otherwise.

I In distinguishing between extreme and typical we use:

I the sampling distribution of the test statistic

I the significance level to define so-called rejection (or critical)

region and the acceptance region.

Test statistic, level of significance and rejection region

Rejection region (RR) and acceptance region (AR) in size tests:

RR = {t : t > T } AR = {t : t T }

AR CRITICAL RR

VALUE

RR = {t : t < T1 } AR = {t : t T1 }

RR CRITICAL AR

VALUE

Two-tail test H1 : 6= 0

2 2

RRCRITICAL AR

CRITICALRR

AR = {t : T1 /2 t T/2 } VALUE VALUE

Test statistics

Let X n be a s.r.s. from a population X with mean and variance 2 , a significance

level, z the upper quantile of N(0,1), 0 the population mean under H0 , etc.

Parameter Assumptions Test statistic RR in two-tail test

8 z 9

>

> z }| { >

>

>

> >

>

>

< >

=

X x 0 x 0

Normal data p0 N(0, 1) z : p < z1 /2 or p > z/2

Known variance / n >

> / n / n >

>

>

> >

>

>

: >

;

ff

Non-normal data X 0 x 0 x 0

Mean p ap. N(0, 1) z : p < z1 /2 or p > z/2

Large sample / n / n / n

ff

Bernoulli data p p0 p p0 p p0

p ap. N(0, 1) z : p < z1 /2 or pp (1 p )/n > z/2

Large sample p0 (1 p0 )/n p0 (1 p0 )/n 0 0

8 t 9

>

> z }| { >

>

>

> >

>

>

< >

=

X 0 x 0 x 0

Normal data p tn t : < tn p

s/ n 1 > p 1;1 /2 or s/ n > tn 1;/2 >

Unknown variance >

> s/ n >

>

>

> >

>

: ;

8 2 9

>

> >

>

>

> z }| { >

>

>

> >

>

>

> >

>

(n 1)s 2

< (n 1)s 2 (n 1)s 2

=

Variance 2 2 : < 2 or > 2

Normal data 2 n 1 > 2 n 1;1 /2 2 n 1;/2 >

0 >

> 0 >

>

>

> 0 >

>

>

> >

>

>

: >

;

( )

(n 1)s 2 2 2 : (n 1)s 2 2 (n 1)s 2 2

St. dev. Normal data 2 n 1 2 < or 2 >

n 1;1 /2 n 1;/2

0 0 0

Test of hypothesis: procedure

2. Calculate the observed value of the test statistic (see the formula

sheet).

3. For a given significance level define the rejection region (RR ).

I Reject H0 , the null hypothesis, if the test statistic is in RR and fail

to reject H0 otherwise.

4. Write down the conclusions in a sentence.

Upper-tail test for the mean, variance known: example

Example: 9.1 (Newbold) When a process producing ball bearings is operating

correctly, the weights of the ball bearings have a normal distribution with mean

5 ounces and standard deviation 0.1 ounces. The process has been adjusted

and the plant manager suspects that this has raised the mean weight of the ball

bearings, while leaving the standard deviation unchanged. A random sample of

sixteen bearings is selected and their mean weight is found to be 5.038 ounces.

Is the manager right? Carry out a suitable test at a 5% level of significance.

Population:

X = weight of a ball bearing (in oz) Test statistic: Z = X /pn0 N(0, 1)

'

X N(, 2 = 0.12 )

Observed test statistic:

SRS: n = 16 = 0.1 0 = 5

n = 16 x = 5.038

Sample: x = 5.038 x 0

z = p

/ n

Objective: test

0 5.038 5

= p = 1.52

z}|{ 0.1/ 16

H0 : = 5 against H1 : > 5

(Upper-tail test)

Upper-tail test for the mean, variance known: example

Rejection (or critical) region:

z=

= {z : z > 1.645} 1.52

Since z = 1.52 2/ RR0.05 we fail

to reject H0 at a 5% significance

level.

N(0,1) density AR RR

z = 1.645

Conclusion: The sample data did not provide sufficient evidence to reject

the claim that the average weight of the bearings is 5oz.

Definition of p-value

( or ) as the observed one (given H0 is true)

I Also called the observed level of significance

I It is the smallest value of for which H0 can be rejected

I Can be used in step 3) of the testing procedure with the following

rule:

I If p-value < , reject H0

I If p-value , fail to reject H0

I Roughly:

I small p-value - evidence against H0

I large p-value - evidence in favour of H0

p-value

p-value when t is the observed value of the test statistic T :

test

stat

=area

p-value = P(T t)

test

stat

=area

p-value = P(T t)

|test |test

stat| stat|

=left+right

areas

p-value: example

Example: 9.1 (cont.)

Population:

X = weight of a ball bearing (in oz)

'

X N(, 2 = 0.12 ) p-value = P(Z z) = P(Z 1.52)

= 0.0643 where Z N(0, 1)

SRS: n = 16

Since it holds that

Sample: x = 5.038 p-value = 0.0643 = 0.05

we fail to reject H0 (but would reject

Objective: test at any greater than 0.0643, e.g.,

0 = 0.1).

z}|{

H0 : = 5 against H1 : > 5

(Upper-tail test)

z=

Test statistic: Z = X /pn0 N(0, 1) 1.52

Observed test statistic: z = 1.52

pvalue

=area

N(0,1) density

1

The p-value and the probability of the null hypothesis

I The p-value:

I is not the probability of H0 nor the Type I error ;

I but it can be used as a test statistic to be compared with (i.e.

reject H0 if p-value < ).

I We are interested in answering: How probable is the null given the

data?

I Remember that we defined the p-value as the probability of the data

(or values even more extreme) given the null.

I We cannot answer exactly.

I But under fairly general conditions and assuming that if we had no

observations Pr(H0 ) = Pr(H1 ) = 1/2, then for p-values, p, such that

p < 0.36:

ep ln(p)

Pr(H0 |Observed Data) .

1 ep ln(p)

The p-value and the probability of the null hypothesis

This table helps to calibrate a desired p-value as a function of the

probability of the null hypothesis:

p-value Pr(H0 |Observed Data)

0.1 0.39

0.05 0.29

0.01 0.11

0.001 0.02

0.00860 0.1

0.00341 0.05

0.00004 0.01

0.00001 0.001

I For a p-value equal to 0.05 the null has a probability of at least 29%

of being true

I While if we want the probability of the null being true to be at most

5%, the p-value should be no larger than 0.0034.

Confidence intervals and two-tail tests: duality

using a (two-tail) 100(1 )% confidence interval in the following way:

1. State the null and two-sided alternative

H0 : = 0 against H1 : 6= 0

3. If 0 doesnt belong to this interval, reject the null.

If 0 belongs to this interval, fail to reject the null.

4. Write down the conclusions in a sentence.

Two-tail test for the mean, variance known: example

Example: 9.2 (Newbold) A drill is used to make holes in sheet metal.

When the drill is functioning properly, the diameters of these holes have a

normal distribution with mean 2 in and a standard deviation of 0.06 in.

To check that the drill is functioning properly, the diameters of a random

sample of nine holes are measured. Their mean diameter was 1.95 in.

Perform a two-tailed test at a 5% significance level using a CI-approach.

Population: 100(1 )% = 95% confidence

X = diameter of a hole (in inches) interval for :

'

X N(, 2 = 0.062 )

CI0.95 () = x 1.96 p

SRS: n = 9 n

0.06

Sample: x = 1.95 = 1.95 1.96 p

9

Objective: test = (1.9108, 1.9892)

0

z}|{ Since 0 = 2 2 / CI0.95 () we

H0 : = 2 against H1 : 6= 2 reject H0 at a 5% significance

(Two-tail test) level.

Two-tail test for the proportion: example

Example: 9.6 (Newbold) In a random sample of 199 audit partners in

U.S. accounting firms, 104 partners indicated some measure of

agreement with the statement: Cash flow from operations is a valid

measure of profitability. Test at the 10% level against a two-sided

alternative the null hypothesis that one-half of the members of this

population would agree with the preceding statement.

Population:

X = 1 if a member agrees with the Test statistic:

statement and 0 otherwise Z = p p p0 approx. N(0, 1)

'

X Bernoulli(p) p0 (1 p0 )/n

Observed test statistic:

SRS: n = 199 large n

p0 = 0.5

Sample: p = 104

= 0.523 n = 199 p = 0.523

199

p p0

Objective: test z = p

p0 (1 p0 )/n

p0

z}|{ 0.523 0.5

H0 : p = 0.5 against H1 : p 6= 0.5 = p

0.5(1 0.5)/199

(Two-tail test) = 0.65

Two-tail test for the proportion: example

Example: 9.6 (cont.)

Rejection (or critical) region:

{z : z < z0.05 }

= {z : z > 1.645} [ z=

{z : z < 1.645}

0.65

to reject H0 at a 10% significance

level.

N(0,1) density RR AR RR

z2 = 1.645 z2 = 1.645

Conclusion: The sample data does not contain sufficiently strong

evidence against the hypothesis that one-half of all audit partners agree

that cash flow from operations is a valid measure of profitability.

Lower-tail test for the mean, variance unknown: example

Example: 9.4 (Newbold, modified) A retail chain knows that, on average, sales

in its stores are 20% higher in December than in November. For a random

sample of six stores the percentages of sales increases were found to be: 19.2,

18.4, 19.8, 20.2, 20.4, 19.0. Assuming a normal population, test at a 10%

significance level the null hypothesis (use a p-value approach) that the true

mean percentage sales increase is at least 20, against a one-sided alternative.

Population:

X = stores increase in sales from Nov to

Dec (in %s) Test statistic: T = Xs/pn0 tn 1

Observed test statistic:

'

X N(, 2 ) 2 unknown

0 = 20 n=6

SRS: n = 6 small n p

x = 1.95 s = 0.588 = 0.767

x 0

Sample: x = 117 = 19.5 t = p

6 s/ n

2 2284.44 6(19.5)2

s = 6 1

= 0.588 19.5 20

= p = 1.597

Objective: test 0.767/ 6

0

z}|{

H0 : 20 against H1 : < 20

(Lower-tail test)

Lower-tail test for the mean, variance unknown: example

Example: 9.4 (cont.)

2 (0.05, 0.1) because

t=

t5;0.05 t5;0.10 1.597

z }| { z }| {

2.015 < 1.597 < 1.476

pvalue

Hence, given that =area

p-value < = 0.1 we reject the

null hypothesis at this level.

tn 1 density

||

2.015 1.476

Conclusion: The sample data gave enough evidence to reject the claim

that the average increase in sales was at least 20%.

p-value interpretation: if the null hypothesis were true, the probability of

obtaining such sample data would be at most 10%, which is quite

unlikely, so we reject the null hypothesis.

Lower-tail test for the mean, variance unknown: example

Example: 9.4 (cont.) in Excel: Go to menu: Data, submenu: Data

Analysis, choose function: two-sample t-test with unequal variances.

Column A (data), Column B (n repetitions of 0 = 20), in yellow

(observed t stat, p-value and tn 1; ).

Upper-tail test for the variance: example

Example: 9.5 (Newbold) In order to meet the standards in consignments of a

chemical product, it is important that the variance of their percentage impurity

levels does not exceed 4. A random sample of twenty consignments had a

sample quasi-variance of 5.62 for impurity level percentages.

a) Perform a suitable test of hypothesis ( = 0.1).

2

b) Find the power of the test. What is the power at 1 = 7?

2

c) What sample size would guarantee a power of 0.9 at 1 = 7?

Population:

X = impurity level of a consignment of a 2

2 2

n 1

0

'

X N(, 2 ) Observed test statistic:

2

0 =4 n = 20

SRS: n = 20 2

s = 5.62

Sample: s 2 = 5.62 2 (n 1)s 2

= 2

0

Objective: test (20 1)5.62

2 =

0 4

z}|{ = 26.695

2 2

H0 : 4 against H1 : >4

(Upper-tail test)

Upper-tail test for the variance: example

Example: 9.5 a) (cont.)

2

p-value = P( 26.695) 2 =

2 (0.1, 0.25) because 26.695

2 2

19;0.25 19;0.1

z}|{ z}|{

22.7 < 26.695 < 27.2 pvalue

=area

Hence, given that p-value exceeds

= 0.1, we cannot reject the null

hypothesis at this level.

2

density 22.7 27.2

n 1

Conclusion: The sample data did not provide enough evidence to reject

the claim that the variance of the percentage impurity levels in

consignments of this chemical is at most 4.

Upper-tail test for the variance: power

Example: 9.5 b) Recall that: power = P(reject H0 |H1 is true)

When do we reject H0 ?

ff

(n 1)s 2 2

RR0.1 = 2

> n 1;0.1 power( 2 ) versus 2

0

8 9

>

> 27.2 4 = 108.8>

z }| { >

1.0

< =

2 2 2 power(22) =

= (n 1)s > n 1;0.1 0 1 ( )

>

> >

>

: ;

0.8

20 = 4

0.6

Hence the power is:

power( 12 ) = P reject H0 | 2 = 12

0.4

= P (n 1)s 2 > 108.8| 2 = 12

0.0 0.2

(n 1)s 2 108.8

=P >

2

1

2

1

0 0 2 4 6 1 8 10

=P 2

>

108.8

=1 F 2

108.8 2

2 2

1

2

1 ` 2 108.8

(F 2 is the cdf of n 1) Hence, power(7) = P > 7

= 0.6874.

Upper-tail test for the variance: sample size calculations

Example: 9.5 c)

From our previous calculations,

we know that

2

2 (n 1)s 2 (n 1)s 2

potencia( 1 ) = P 2 > 2n 1;0.1 02 , 2 2

n 1

1 1 1

0 1

0.571

B (n z}|{ C

B 1)s 2 2 4 C

power(7) = P B 2 > n 1;0.1 C 0.9

@ 1 7 A

2

The last equation implies that we are dealing with a n 1 distribution,

whose upper 0.9-quantile satisfies 2n 1;0.9 0.571 2n 1;0.1 .

2 2

chi-square table 43;0.9 / 43;0.1 = 0.573 > 0.571 ) n 1 = 43

alternative value 12 = 7 with at least 90% chance.

Another power example: lower-tail test for the mean,

normal population, known 2

I H0 : 0 versus H1 : < 0 at = 0.05

I Say that 0 = 5, n = 16, = 0.1

I We reject H0 if x /pn0 < z = 1.645 that is when x 4.96, hence

4.96 p1

power(1 ) = P Z < 0.1/ 16

1.0

1.0

power() =

1 () n=16

n=9

0.8

0.8

0 = 5 n=4

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0

4.85 4.95 5.05 4.85 4.95 5.05

1 0

Another power example: lower-tail test for the mean,

normal population, known 2

Note that the power = 1 P(Type II error) function has the following

features (everything else being equal):

I The farther the true mean 1 from the hypothesized 0 , the greater

the power

I The smaller the , the smaller the power, that is, reducing the

probability of Type I error will increase the probability of Type II error

I The larger the population variance, the lower the power (we are less

likely to detect small departures from 0 , when there is greater

variability in the population)

I The larger the sample size, the greater the power of the test (the

more info from the population, the greater the chance of detecting

any departures from the null hypothesis).

