You are on page 1of 13

STATISTICAL TESTS OF HYPOTHESES

A hypothesis is a proposal intended to explain certain facts or observations. It is a concept that is


not yet verified but if true would explain certain facts or phenomena. It is therefore a message
expressing an opinion based on incomplete evidence and hence a claim which should be verified
through a test.

Situations arise in which rather than estimate the value of a population parameter, it must be
decided whether a statement concerning the parameter is true or false. In do so the testing of a
hypothesis takes place.

The two types of errors.

When testing a hypothesis we are likely to commit two types of errors.

Let H be the hypothesis.

Accept H Reject H

H is true Correct Type I error


decision (α) Type I error: Occurs when a correct hypothesis is
H is false Type II error Correct rejected.
(β) decision Type II error: Occurs when an incorrect hypothesis
is accepted.

The two types of hypotheses:

The null hypothesis (


H0)

This is the hypothesis to be tested. It may be looked at as any hypothesis set up primarily to see
whether it can be rejected.

The alternative hypothesis (


H 1)

This is the hypothesis we accept when the null hypothesis must be rejected.

The level of significance:

The testing of a hypothesis is based on the probability of Type I error also known as the level of
significance and denoted by α. It is the probability of rejecting the null hypothesis when it is true.
For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference
exists when there is no actual difference.

Procedure:
1. Formulate the null hypothesis and the appropriate alternative hypothesis.

2. Specify the probability of Type I error (α) also known as the level of significance

3. Set up the criterion for testing the null hypothesis against the alternative hypothesis.

4. Calculate the test statistic.

5. Decide whether to accept or reject the null hypothesis.

Hypotheses concerning one mean

Null hypothesis: This is formulated as


H 0 : μ=μ 0 where μ0 is a given numerical value.

Alternative hypotheses:

Alternative hypotheses of the types


H 1 : μ ¿ ¿ and H 1 : μ ¿ μ0 ¿ are called one sided or one
H
tailed alternatives. This is because the rejection region for 0 in each case is found on one side
of the curve.

H1: μ ¿¿ H 1 : μ ¿ μ0 ¿

α α

−t α−t α (n−1)
(n−1) 0 0
t α (n−1)

Rejection region for


H0 Rejection region for
H0

Alternative hypothesis of the type


H 1 : μ ≠ μ 0 is called two sided or two tailed alternative
H
because the rejection regions for 0 are to be found on both sides of the curve.

H1 : μ ≠ μ0

α α
2 2

−t α (n−1 ) t α (n−1 )
2 2
Summary:

Null hypothesis Test Statistic Alternative hypotheses


Reject
H 0 if:
H 0 : μ=μ 0 x̄ −μ 0 H1: μ ¿ ¿ t ¿¿
t= s
√n H 1 : μ ¿ μ0 ¿ t ¿ t α ( n−1 )¿
Where
∑ fx
x̄=
∑f H 1 : μ ≠ μ0 {t ⟨−t (n−1)¿t ⟩t (n−1)
α
2
α
2

√ [∑ ]
2
1 2 ( ∑ fx )
s= fx −
n−1 ∑f

NB: When formulating the alternative hypotheses, the question will be very clear on the form it
should take,
H 1 : μ ¿ ¿ or H 1 : μ ¿ μ0 ¿ . When this is not clear formulate it as H 1 : μ ≠ μ 0 .

Examples:

1. The lengths of rods made by an artisan were measured and recorded as follows:

Length(cm) 14.2 14.3 14.4 14. 16.0


6
No. of rods 6 5 3 7 4
Test at 1% level of significance
whether the mean length is less than 14.8cm.

Solution:

For this test to be done we must assume that that there was a claim that the mean length is
14.8cm.
H 0 : μ=14.8
¿
1. ¿

2. α=0 . 01

3. Criterion: Reject
H 0 if t ¿¿ i.e. t ¿ ¿

α=0 . 01
0

−t 0.01 (24 )=−2 , 492

4. Calculations:

x̄ −u0
t= s
√n x f fx fx2

14.2 6 85.2 1209.84

14.3 5 71.5 1022.45

14.4 3 43.2 622.08

14.6 7 102.2 1492.12

16.0 4 64 1024.0

√ [ ]
2
366 .1 1 (366 . 1)
x̄= =14 . 644 s= 5370. 44− =0 .6233
25 24 25

14 . 644−14 . 8
t= 0. 6233
=−1 . 251
√25

5. Decision:

H
Since −1.251 ¿ −2.492¿ , 0 is accepted. We can conclude that the mean is not less than
14.8cm at 1% level of significance.
2. The contents of a random sample of 15 similar containers of a chemical used in the
construction industry are as follows:

Content(lit) 9.6 9.8 10.0 10. 10.4


2
No. of containers 2 4 5 3 1

The manufacturer claims that the mean content of the containers is 10.0 litres. Do the data
support the claim at 5% level of significance?

Solution:

H 0 : μ=10
1. H 1 : μ≠10
2. α=0 . 05

3. Criterion: Reject
H 0 if t ¿¿ or
t ¿ t α (n−1)¿
2 i.e. t ¿ ¿ or t ¿ 2.145¿

α
2
=0 . 025

0
t 0. 025=2 . 145

4. Calculations:
x̄=9 . 96 , s=0 . 229 , t=−0 . 677

5. Decision:
H
Since −0.667 ¿ −2.145¿ , 0 is accepted. We can conclude that the data supports the claim
at 5% level of significance.

Exercises:

1. The times between seven calls for an ambulance and the patient’s arrival at the hospital are 27,
15, 20, 32, 18, 26 and 28. The ambulance service claims that it takes on average 20 minutes
between the call for an ambulance and the patient’s arrival at the hospital. Test at 5% level of
significance whether the claim is valid.
2. A random sample of a company’s files shows that 8 orders for a certain piece of machinery
were filled in 10, 12, 19, 14, 15, 18, 11 and 13 days. At 1% level of significance, can it be
concluded that the period such orders are filled exceeds 10.5 days?

Hypotheses concerning the difference between two means


2
We will consider two populations with a common variance σ and test whether the difference
between the means of the two populations equals to zero i.e. whether the two means are equal.

The null hypothesis is


H 0 : μ1 =μ2

The alternative hypotheses are:

H 1 : μ1 ¿ ¿ And reject H 0 if t ¿¿
H 1 : μ1 ¿ μ2 ¿ And reject H 0 if t ¿ t α (n1 +n 2−2 )¿

H 1 : μ1 ≠μ 2 And reject H 0 if t ¿¿ or
t ¿ t α (n1 +n2 −2)¿
2

x̄ 1 − x̄2
t=

Test statistic: √ σ2 σ2
+
n1 n2

Example:

The following random samples are measurements of the heat producing capacity in millions of
calories per ton of specimens of coal from two mines.

Mine 1: 8260 8130 8350 8070 8340

Mine 2: 7950 7890 7900 8140 7920 7840

If the variance of the heat producing capacity from the two mines has been found to be 13,000,
test at 1% level of significance whether there is any difference in the heat producing capacity of
the two mines.

Solution:

Let μ1 and μ2 be the population means from mine 1 and mine 2 respectively.

H 0 : μ1 =μ2
1. H 1 : μ1 ≠μ 2
α
2
=0 . 005

0
t 0.005 (9 )=3. 250
2. α=0 . 01
3. Criterion: Reject
H 0 if t ¿ ¿ or t ¿ 3.250¿

4. Calculations:
8230−7940
t= =4 . 19
x̄ 1=8230 , x̄ 2=7940 ,
5. Decision:
√ 13000 13000
5
+
6

H
Since 4.19 ¿ 3.250¿ 0 is rejected. We can then conclude that there is a difference in the
heat producing capacity of the two mines.

2. There are two methods for instructing trainees as part of an industrial training programme. The
scores obtained by each of 10 trainees instructed by each of these two methods are given below.

Trainee 1 2 3 4 5 6 7 8 9 10
Method A 71 65 75 69 73 66 68 71 74 68
Method B 72 84 77 78 69 70 77 73 65 75

If the standard deviation of the scores from the two methods has been found to be 20, test at 5%
level of significance whether method B is more effective than method A.

Solution:

Let μ1 and μ2 be the population means for methods A and B respectively.

H 0 : μ1=μ2
¿
1. ¿
2. α=0 . 05
3. Criterion: Reject
H 0 if t ¿ ¿

α=0 . 05

−t 0.05 (18)=−1. 734


4. Calculations:
70−74
t= =−. 447


2 2
20 20
x̄ 1=70 , x̄ 2=74 , +
10 10

5. Decision:
H
Since −0.447 ¿ −1.734¿ 0 is accepted. We can conclude that both methods are equally
effective at 5% level of significance.

Exercise:

A potential buyer of electric light bulbs bought 100 bulbs, each of two famous brands A and B.
Upon testing them, he found that brand A had a mean life of 1500 hours while brand B had a
mean life of 1530 hours. From the past experience, the variance from the two brands has been
found to be 55 hours. Test at 5% level of significance whether the two brands differ significantly
in quality.

Two-way contingency tables analyses.

Contingency tables arise when the classes in which frequencies are grouped are not class
intervals but correspond to some attribute or some descriptive quality. We’ll consider
contingency tables in which the data are tallied into a two-way classification having r rows and c
columns. They can therefore be referred to as r×c tables.

Example: If cars are rated as excellent, superior, average or poor with regard to performance and
appearance, we get the following 4×4 contingency table.

APPEARANCE

E S A P

E a

PERFORMANCE S b

A c

P d

Each car would fall into one of the 16 cells: a would be the number of cars that are excellent in
both performance and appearance, b would be the number of cars that are superior in
performance and average in appearance, c would be the number of cars that are average in
performance and superior in appearance while d would be the number of cars that are poor in
both performance and appearance.
This table contains the observed frequencies
Oi j , i=1, 2...r , j=1, 2...c

2
These tables are analyzed using the chi-square, χ distribution where we seek to test the null
hypothesis that the two attributes are independent:

H 0 : Performance and appearance are independent

H 1 : Performance and appearance are dependent

Procedure:

E
First the expected frequencies i j are calculated by multiplying the totals of the respective rows
and columns and then dividing by the grand total.
2
r c
( Oi j −Ei j )
χ =∑ ∑
2

Secondly we substitute in the formular i=1 j=1 Ei j and reject the null hypothesis
2 χ 2 (r−1)(c−1 )
if the calculated value of χ exceeds α which is the value, with (r−1)(c−1)
degrees of freedom, read from the chi-square distribution table to the right of which the area
under the curve is α .

Examples:

1. With r=3 , c=4 , α =0 . 05 , χ α2 (r−1 )( c−1)= χ 0. 052 ( 6 )=12 . 592

NB: We’ve 12 cells and with 6 degrees of freedom we’re free to calculate
Ei j for 6 cells and

obtain the rest by subtraction from the totals. The totals of


Oi j and Ei j tables must be the same.

α=0 . 05

χ =12 .592
0 0 .05 2

2. A company director wishes to determine whether there is a relationship between an


employee’s performances in the company’s training programme and the employee’s ultimate
success in the job. A sample of 400 cases from the files is taken and the following table obtained.
Performance in the training programme

Below Average Above

Average Average

Poor 23 60 29

Success in the job

(Employer’s rating) Average 28 79 60

Very Good 9 49 63

Test at 1% level of significance whether performance in the training programme and success in
the job are independent.

Solution:

1.
H 0 : Performance in the training programme and success in the job are independent

H 1 : Performance in the training programme and success in the job are dependent

2. α=0 . 01
2 2
3. Criterion: Reject
H 0 if χ ¿ χ α2 ( r−1)( c−1)¿ i.e. χ ¿ χ 0. 012 (4 )¿ i.e. χ 2 ¿ 13.277¿

α=0 . 01

χ =13 .277
0 0 .012

4. Calculations:
E
Since we’ve 4 degrees of freedom, we’ll calculate i j for 4 cells and obtain the rest by
subtraction from the totals of the respective rows and columns.

Observed frequencies
Oi j Total
o1 1 23 o1 2 60 o1 3 29 112

o 2 1 28 o 2 2 79 o 2 3 60 167

o 3 1 9 o 3 2 49 o 3 3 63 121

Total 60 188 152 400

Expected frequencies
Ei j

112×60 112×188
E1 1= =16 . 8 E1 2= =52 . 6 E1 3=112−(16 . 8+52. 6 )=42. 6
400 400

167×60 167×188
E2 1= =25 . 0 E2 2= =78 .5 E2 3 =167−(25. 0+78 . 5)=63 .5
400 400

E3 1=60−(16 .8+25 . 0 )=18 .2 E3 2 =188−(52 . 6+78 .5 )=56 . 9


E3 3 =152−( 42. 6+63 . 5)=45 . 9
2
( Oi j−E i j )
Oi j Ei j Ei j

23 16.8 2.2881

60 52.6 1.0412

29 42.6 4.3418

28 25.0 0.3600

79 78.5 0.0032

60 63.5 0.1929
9 18.2 4.6505

49 56.9 0.1097

63 45.9 6.3706
2
χ =19 .358

5. Decision:
H
Since 19.358 ¿ 13.277¿ , 0 is rejected. We can therefore conclude that there is
dependence between the company’s training programme and employee’s ultimate success
in the job.

Exercise:

1. A company carries two lines of products and has 3 salespersons. The table below shows a
record of one month of the number of units of the two lines sold by each salesperson.

Salesperson

1 2 3

Line 1 20 8 15

Line 2 17 16 5

Test at 10% level of significance the claim that each salesperson’s ability depends on the line
he/she is selling.

2. 200 tyres each of brands A, B, C and D were tested for the distance in km that they would last
and the following table obtained.

Brand

A B C D

Distance

Failed to last 32154 km 26 23 15 32

Lasted from 32154 to 48232 km 118 93 116 121

Lasted more than 48232 km 56 84 69 47

Test at 5% level of significance whether there is any difference in the quality of the four brands
with regard to their durability.

You might also like