You are on page 1of 80

Hypothesis Testing

Hypothesis testing or significance testing, is a method for


checking whether an apparent result from a sample could
possibly be due to randomness.

It serves to check on how strong the evidence is.


Statistical hypothesis testing is a means of
assessing whether apparent results in a sample
conclusively indicate that something is really happening .
Hypothesis and Hypothesis Testing
HYPOTHESIS A statement about the value of a population parameter developed for the purpose of testing.

HYPOTHESIS TESTING A procedure based on sample evidence and probability theory to determine whether
the hypothesis is a reasonable statement.

TEST STATISTIC A value, determined from sample information, used to determine whether to reject the null
hypothesis.

CRITICAL VALUE The dividing point between the region where the null hypothesis is rejected and the region
where it is not rejected.
Important Things to Remember about H0 and H1
 H0: null hypothesis and H1: alternate
hypothesis
 H0 and H1 are mutually exclusive and
collectively exhaustive
 H0 is always presumed to be true
 H1 is the research hypothesis
 A random sample (n) is used to “reject H0”
 If we conclude 'do not reject H0', this does
not necessarily mean that the null
hypothesis is true, it only suggests that
there is not sufficient evidence to reject H0;
rejecting the null hypothesis then, suggests
that the alternative hypothesis may be true.
 Equality is always part of H0 (e.g. “=” , “≥” ,
“≤”).
 “≠” “<” and “>” always part of H1
 In actual practice, the status quo is set up
as H0
 In problem solving, look for key words and
convert them into symbols. Some key
words include: “improved, better than, as
effective as, different from, has changed,
etc.”
There are 4 major components of a test of hypothesis.
1. Null hypothesis

2. Alternative hypothesis (research hypothesis)

3. Test statistic

4. Rejection region
Null hypothesis

H0

Example: If we wanted to test whether the mean


weight loss of people who have
participated in a new weight program
is 3 kg, we would test

H0 :   3
Alternative hypothesis
H1

Example : If a tyre company wanted to know


whether the average life of its new
radial tyre exceeds its advertised
value of 50,000 km., the company
would specify the alternative
hypothesis as
H 1 :   50 ,000
2. If the company wanted to know whether the
average life of the tyre is less than 50,000km,
it would test
H 1 :   50 ,000

3. If the company wanted to determined whether


the average life of the tyre differs from the
advertised value,.

H 1 :   50 ,000
Signs in the Tails of a Test
Test Statistic

The purpose of the test is to determine whether


it is appropriate or not to reject the null hypothesis.

Therefore the test statistic is the sample statistic


upon which we base our decision to either reject
or not reject the null hypothesis.
Rejection Region

The key question answered by the rejection region is when is


the value of the test statistic sufficiently different from the
hypothesized value of the parameter to enable us to reject the
null hypothesis.
The process we use in answering this question depends on
the probability of our making a mistake when testing the
hypothesis.

Since the conclusion we draw is based on sample data, the


chance of our making one of two possible errors will
always exist.

continued
H 0 is true H 0 is false

Reject H0 Type I error Correct


P(Type I)=  Decision

Correct Type II error


Do not reject P(Type II)= 
Decision
H0
The Power of Statistical Test
The power of a statistical test, given as 1 –  
= P (reject H0 when H0 is false), measures the
ability of the test to perform as required. This
1 –   is called the power of the function. This
means that greater the power of the function
the better would be the decision rule.
There are two types of tail test
1. One-tailed tests - the rejection region is in only
one tail of the distribution
2. Two-tailed tests - the rejection region is in both
tails of the distribution
Two-tailed Test

Rejection Rejection
Region Acceptance Region
Region

One-tailed Test

Rejection
Region Acceptance
Region
Steps in hypothesis testing

- Define Null hypothesis


- Define Alternative hypothesis
- Calculate Test statistic
- Determine Rejection region
- Compare Value of the test statistic with
Critical Value
- Conclusion
Testing the Population Mean When the
Population Standard Deviation is Known

 Example
– A new billing system for a department store will
be cost- effective only if the mean monthly
account is more than $170.
– A sample of 400 accounts has a mean of $178.
– If accounts are approximately normally
distributed with  = $65, can we conclude that
the new system will be cost effective?
Testing the Population Mean ( is Known)

 Example – Solution
– The population of interest is the credit
accounts at the store.
– We want to know whether the mean account
for all customers is greater than $170.
H1 :  > 170
– The null hypothesis generally specifies a
single value of the parameter 
H0 :  = 170
Approaches to Testing
The Rejection Region Method

The rejection region is a range of values


such that if the test statistic falls into
that range, the null hypothesis is
rejected in favor of the alternative
hypothesis.
The Rejection Region Method –
for a Right - Tail Test
Example – solution continued

• Recall: H0:  = 170


H1:  > 170
therefore,

• It seems reasonable to reject the null hypothesis and


believe that  > 170 if the sample mean is sufficiently large.
Reject H0 here

Critical value of the sample mean


The Rejection Region Method
for a Right - Tail Test
Example – solution continued

• Define a critical value x L for x that is just large enough


to reject the null hypothesis.
• Reject the null hypothesis if

x  xL
The standardized test statistic
– Instead of using the statistic x , we can
use the standardized value z.
x 
z
 n
– Then, the rejection region becomes
One tail test
z  z
Critical Value for the Rejection Region
 Set the probability of committing a Type I
error be  (also called the significance
level).
The standardized test statistic

 Example - continued
– We re-do this example using the
standardized test statistic.
Recall: H0:  = 170
H1:  > 170
– Test statistic:
x   178  170
z   2.46
 n 65 400
– Rejection region: z > z.051.645.
The standardized test statistic

 Example - continued

Re ject the null hypothesis if


Z  1.645

Conclusion
Since Z = 2.46 > 1.645, reject the null
hypothesis in favor of the alternative
hypothesis.
p-value Method
– The p-value provides information about the
amount of statistical evidence that supports
the alternative hypothesis.

– The p-value of a test is the probability of observing a


test statistic at least as extreme as the one computed,
given that the null hypothesis is true.

– Let us demonstrate the concept on Example


P-value Method

The probability of observing a


test statistic at least as extreme as 178,
given that  = 170 is…

P( x  178 when   170)


178  170
 P( z  )
65 400
 P( z  2.4615)  .0069
 x  170
x  178 The p-value
Interpreting the p-value

We can conclude that the smaller the p-value


the more statistical evidence exists to support the
alternative hypothesis.

H 0 :  x  170
H1 :  x  170
x  178
Interpreting the p-value
 Describing the p-value
– If the p-value is less than 1%, there is
overwhelming evidence that supports the
alternative hypothesis.
– If the p-value is between 1% and 5%, there is a
strong evidence that supports the alternative
hypothesis.
– If the p-value is between 5% and 10% there is a
weak evidence that supports the alternative
hypothesis.
– If the p-value exceeds 10%, there is no evidence
that supports the alternative hypothesis.
The p-value and the Rejection
Region Methods
– The p-value can be used when making
decisions based on rejection region methods
as follows:
• Define the hypotheses to test, and the required
significance level 
• Perform the sampling procedure, calculate the test
statistic and the p-value associated with it.
• Compare the p-value to Reject the null
hypothesis only if p-value <; otherwise, do not
reject the null hypothesis.
 = 0.05
The p-value
 x  170
x L  175.34 x  178
Steps in Hypothesis Testing using SPSS
 State the null and alternative hypotheses
 Define the level of significance (α)
 Calculate the actual significance : p-
value
 Make decision : Reject null hypothesis, if
p≤ α, for 2-tail test; and
if p*≤ α, for 1-tail test.(p* is p/2 when p is
obtained from 2-tail test)
 Conclusion
Inference About a Population Mean When the
Population Standard Deviation Is Unknown or
When the Sample Size is Small
In practice, the population standard deviation will be
unknown.
Recall that when  is known we use the following
statistic to estimate and test a population mean
x
z
When  is unknown or when n the sample size is

small, we use its point estimator s, and the z-


statistic is replaced then by the t-statistic
The t - Statistic

x 
t
s n

The t distribution is mound-shaped, The “degrees of freedom”,


(a function of the sample size)
and symmetrical around zero.
determine how spread the
distribution is (compared to the
d.f. = v2 normal distribution)
d.f. = v1
v 1 < v2
0
Testing  when  is unknown
 Example
– In order to determine the number of workers
required to meet demand, the productivity of
newly hired trainees is studied.

– It is believed that trainees can process and


distribute more than 450 packages per hour
within one week of hiring.

– Can we conclude that this belief is correct,


based on productivity observation of 50
trainees (see file PROD.sav).
Testing  when  is unknown
 Example – Solution
– The problem objective is to describe the
population of the number of packages
processed in one hour.
– H0: = 450
H1: > 450
– The t statistic
x 
t
d.f. = sn - 1n = 49
Testing  when  is unknown
 Solution continued (solving by hand)

– The rejection region is From the data we have


t > t,n – 1
t,n - 1 = t.05,49  x i  23,019  i  10,671,357, thus
x 2

 t.05,50 = 1.676. 23,019


x  460.38, and
50
 x
2

s2 
 x 
2
i
i
n  1507.55.
n 1
s  1507.55  38.83
Testing  when  is unknown
Rejection region
• The test statistic is 1.676 1.89

x  460.38  450
t   1.89
s n 38.83 50

• Since 1.89 > 1.676 we reject the null hypothesis in


favor of the alternative.
• There is sufficient evidence to infer that the mean
productivity of trainees one week after being hired is
greater than 450 packages at .05 significance level.
Solution using SPSS (use file PROD.sav)
One-Sample Statistics

N Mean Std. Deviation Std. Error Mean


Packages
50 460.38 38.827 5.491

One-Sample Test

Test Value = 450


95% Confidence Interval of
the Difference
Sig. (2- Mean
t df tailed) Difference Lower Upper
Packages
1.890 49 .065 10.380 -.65 21.41
Inference About a Population
Proportion
 Statistic and sampling distribution
– the statistic used when making inference
about p is: x
p̂  where
n
x  the number of successes.
n  sample size.

– Under certain conditions, [np > 5 and n(1-p) > 5],



is approximately normally distributed, with  = p
and 2 = p(1 - p)/n.
Testing and Estimating the
Proportion
 Test statistic for p

p̂  p
Z
p(1  p) / n
where np  5 and n(1  p)  5
Testing the Proportion
 Example 12.6
– A pharmaceutical company claimed that its
medicine was 80% effective in relieving allergy
for a period of 15 hours. In a sample of 200
persons, who were given medicine, 150
persons had relief. Do you thank that the
company’s claim is justified? Use 0.05 level of
significance.
Testing the Proportion
 Solution
– The problem objective is to test the
effectiveness of medicine.
– The data are nominal.
– The parameter to be tested is ‘p’.
– Success is defined as “having relief”.
– The hypotheses are:
H0: p = .8
H1: p < .8
Testing the Proportion
– Solution
• The rejection region is z < z = z.05 = -1.645.
• The sample proportion is pˆ  150 200  .75
• The value of the test statistic is

pˆ  p .75  .8
Z   1.786
p (1  p ) / n .8(1  .8) / 200
Since calculated z is less than critical value, we
reject null hypothesis and conclude that the
claim of the company that its medicine is 80%
effective is not justified.
T-Tests : When sample size is small
(<30) or When the Population
Standard Deviation Is Unknown
 Variable : Normal
 Types of t-tests:
One-sample t-test
Paired or dependent
sample t-test Independent
samples t-test (Equal and
Unequal Variance)
One-sample t-test

H 0 :   0
H1 :    0
H1 :   0
H1 :   0
Paired sample t-test

H0 : d  0
H1 : d  0
H1 : d  0
H1 : d  0
Matched pairs

The mean of the population differences is D


that is 1   2   D

Test statistic:
xD   D
t
s D nD
Degree of freedom = nD  1
Independent sample t-test

H 0 : 1   2
H 1 : 1   2
H 1 : 1   2
H 1 : 1   2
The sampling process.

Population 1 Population2
Parameters: Parameters:
 1and 1 2
 2 and 22

Statistics: Statistics:
2
x1 ands1 2
x2 ands 2
Sample size: n1 Sample size: n2
If the two population standard deviations are
unknown, then we can estimate the standard
error of the difference between two means.

ˆ ˆ2 2

ˆ x1  x2   1 2

n1 n2
Test statistic:

z
x1  x2 
2 2
ˆ ˆ
1
 2
n1 n2
If population variance unknown and the sample size
is small and the population variances are equal

Then we will use the weighted average called a


“ pooled estimate” of
2

1 1
 x1  x2  s   
2
p
 n1 n2 
Where:

s 
2 n  1s  n  1s
1
2
1 2
2
2
p
n1  n2  2
Test statistic:

t
x1  x2 
2 1 1
s   
p
 n1 n2 
Degree of freedom = n1  n2  2
One way
Analysis of Variance ( ANOVA )

ANOVA is a technique used to test a


hypothesis concerning the means of three
or more populations.
Comparing Means of Three or More Populations
The F distribution is used for testing whether two or more sample means came from the
same or equal populations.
Assumptions:
– The sampled populations follow the normal distribution.
– The populations have equal standard deviations.
– The samples are randomly selected and are independent.

The Null Hypothesis is that the population means are the same. The Alternative
Hypothesis is that at least one of the means is different.

H0: µ1 = µ2 =…= µk
H1: The means are not all equal
Reject H0 if F > F,k-1,n-k
The test statistic used to test the hypothesis is
F statistic

Assumptions:

1. The random variable is normally


distributed.

2. The population variances are equal.

H 0 : 1   2  3  ........
H1 : Not all means are same
ANOVA – Example (File Airlines.sav)
EXAMPLE
Recently a group of four major carriers
joined in hiring Brunner Marketing
Research, Inc., to survey recent
passengers regarding their level of
satisfaction with a recent flight. The
survey included questions on ticketing,
boarding, in-flight service, baggage
handling, pilot communication, and so
forth.

Twenty-five questions offered a range of


possible answers: excellent, good, fair,
or poor. A response of excellent was
given a score of 4, good a 3, fair a 2,
and poor a 1. These responses were
then totaled, so the total score was an
indication of the satisfaction with the Step 1: State the null and alternate hypotheses.
flight. Brunner Marketing Research,
Inc., randomly selected and surveyed H0: µE = µA = µT = µO
passengers from the four airlines. H1: The means are not all equal
Reject H0 if F > F,k-1,n-k
Is there a difference in the mean satisfaction
level among the four airlines?
Use the .01 significance level. Step 2: State the level of significance.
The .01 significance level is stated in the
problem.
ANOVA – Example
Step 3: Find the appropriate test statistic. Use the F statistic
Calculations: It is convenient to summarize the calculations of F statistic in an ANOVA Table.
ANOVA – Example
Compute the value of F and make a decision

We find deviation of each observation from the grand mean,


square the deviations, and sum this result for all 22
observations.
SS total = {(94-75.64)2 + (90-75.64)2 + ……+ (65-75.64)2 }
= 1485.10

To compute SSE, find deviation between each observation and its treatment mean. Each of these
values is squared and then summed for all 22 observations.
SSE = {(94-87.25)2 + (90-87.25)2 + ……+ (80-87.25)2 } + {(75-78.20)2 + (68-78.20)2 + ……+ (88-
78.20)2 } + {(70-72.86)2 + (73-72.86)2 + ……+ (65-72.86)2 } + {(68-69)2 + (70-69)2 + ……+ (65-
69)2 } = 594.41
Finally, determine SST = SS total – SSE.
SST = 1485.10 – 594.41 = 890.69
ANOVA – Example
Step 3: Find the appropriate test statistic. Use the F statistic
Calculations: It is convenient to summarize the calculations of F statistic in an
ANOVA Table.

Step 4: State the decision rule.


Reject H0 if: F > F,k-1,n-k
F > F.01,4-1,22-4
F > F.01,3,18
F > 5.09
Step 5: Make a decision.

The computed value of F is 8.99, which is greater than the critical value of 5.09, so the
null hypothesis is rejected.
Conclusion: The mean scores are not the same for the four airlines; at this point we can
only conclude there is a difference in the treatment means. We cannot determine which
treatment groups differ or how many treatment groups differ.
ANOVA Example – SPSS Output
Test of Homogeneity of Variances
Satisfaction
Levene Statistic df1 df2 Sig.
.962 3 18 .432

ANOVA

Satisfaction
Sum of
Squares df Mean Square F Sig.
Between 890.684 3 296.895 8.991 .001
Groups
Within Groups 594.407 18 33.023

Total 1485.091 21
ANOVA Example – SPSS Output
Multiple Comparisons
Satisfaction
Tukey HSD
(I) Carrier (J) Carrier Mean 95% Confidence Interval
Difference (I- Lower Upper
J) Std. Error Sig. Bound Bound
TWA 9.050 3.855 .124 -1.85 19.95
Eastern Allegheny 14.393 *
3.602 .004 4.21 24.57
Ozark 18.250* 3.709 .001 7.77 28.73
Eastern -9.050 3.855 .124 -19.95 1.85
TWA Allegheny 5.343 3.365 .410 -4.17 14.85
Ozark 9.200 3.480 .071 -.63 19.03
Eastern -14.393 *
3.602 .004 -24.57 -4.21
Allegheny TWA -5.343 3.365 .410 -14.85 4.17
Ozark 3.857 3.197 .631 -5.18 12.89
Eastern -18.250 *
3.709 .001 -28.73 -7.77
Ozark TWA -9.200 3.480 .071 -19.03 .63
Allegheny -3.857 3.197 .631 -12.89 5.18
*. The mean difference is significant at the 0.05 level.
ANOVA Example – SPSS Output
Homogeneous Subsets
Satisfaction
Tukey HSDa,b
Carrier
Subset for alpha = 0.05
N 1 2
Ozark 6 69.00
Allegheny 7 72.86
TWA 5 78.20 78.20
Eastern 4 87.25
Sig. .078 .085
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 5.266.
b. The group sizes are unequal. The harmonic mean of the group
sizes is used. Type I error levels are not guaranteed.
One Way Analysis of Variance

 Example 2
– An apple juice manufacturer is planning to
develop a new product -a liquid concentrate.
– The marketing manager has to decide how to
market the new product.
– Three strategies are considered
• Emphasize convenience of using the product.
• Emphasize the quality of the product.
• Emphasize the product’s low price.
One Way Analysis of Variance
 Example 2- continued
– An experiment was conducted as follows:
• In three cities an advertisement campaign was
launched
• In each city only one of the three characteristics
(convenience, quality, and price) was emphasized.
• The weekly sales were recorded for twenty weeks
following the beginning of the campaigns.
One Way Analysis of Variance
Convnce Quality Price
529 804 672 See file
Weekly 658
793
630
774
531
443
JUICE.xls
sales 514 717 596
663 679 602
719 604 502
711 620 659
606 697 689
Weekly
461 706 675
529 615 512
sales
498 492 691
663 719 733
604 787 698
495 699 776
485
557
Weekly
572
523
561
572
353 sales
584 469
557 634 581
542 580 679
614 624 532
Defining the Hypotheses

• Solution

H0: 1 = 2= 3
H1: At least two means differ

To build the statistic needed to test the


hypotheses use the following notation:
ANOVA Table
JUICE.sav

Since 0.047 < 0.05, there is sufficient evidence


to reject Ho in favor of H1, and argue that at least one
of the mean sales is different than the others.
Chi-squared Test of a
Contingency Table
 Test of Independence : Test on
association between two nominal
variables regarding contingency tables.

Null Hypothesis : Two variables are


independent
Alternative Hypothesis : The two variables
are dependent
The Chi-square Distribution
At the outset, we should know that the chi-
square distribution has only one parameter
called the ‘degrees of freedom’ (df ) as is the
case with the t-distribution. The shape of a
particular chi-square distribution depends on
the number of degrees of freedom.
Properties of Chi-square Distribution

1. Chi-square is non-negative in value; it is


either zero or positively valued.

2. It is not symmetrical; it is skewed to the


right.

3. There are many chi-square distributions.


As with the t-distribution, there is a
different chi-square distribution for each
degree-of-freedom value.
The chi-squared statistic measures the difference
between the actual counts and the expected
counts ( assuming validity of the null hypothesis)

( Observed count - Expected count )2


The sum
Expected count

O  E 
k
2

 i i

i 1 Ei
Contingency table 2 test – Example

– In an effort to better predict the demand for courses


offered by a certain MBA program, it was hypothesized
that students’ academic background affect their choice
of MBA major, thus, their courses selection.
– A random sample of last year’s MBA students was
selected. The data is given in the file Chi-Sq_MBA.sav.
The following contingency table summarizes relevant
data.
The file Chi_Sq_MBA_Table.sav gives the data as per the
contingency table.
Contingency table  test – 2

Example
Degree Accounting Finance Marketing
BA 31 13 16 60
BENG 8 16 7 31
BBA 12 10 17 60
Other 10 5 7 39
61 44 47 152

The observed values


Contingency table  test – 2

Example
 Solution
– The hypotheses are:
H0: The two variables are independent
H1: The two variables are dependent

– The test statistic – The rejection region


2
(Oi  Ei )
k
 2
  2   2,(r 1)( c 1)
i 1 Ei
k is the number of cells in
the contingency table.
Estimating the expected
frequencies
Undergraduate MBA Major
Degree Accounting Finance Marketing Probability
BA 6060 60/152
BENG 31 31/152
BBA 3939 39/152
Other 22 22/152
6161 44
44 47 152
152
Probability 61/152 44/152 47/152

Under the null hypothesis the two variables are independent:

P(Accounting and BA) = P(Accounting)*P(BA)= [61/152][60/152].


The number of students expected to fall in the cell “Accounting - BA” is
eAcct-BA = n(pAcct-BA) = 152(61/152)(60/152) = [61*60]/152 = 24.08
The number of students expected to fall in the cell “Finance - BBA” is
eFinance-BBA = npFinance-BBA = 152(44/152)(39/152) = [44*39]/152 = 11.29
The expected frequencies for a
contingency table
• The expected frequency of cell of raw i and
column j in the contingency table is calculated by

(Column j total)(Row i total)


Eij =
Sample size
2
(Oi  Ei )
k
 
2

i 1 Ei
Calculation of the 2 statistic
• Solution – continued
Undergraduate MBA Major
Degree Accounting Finance Marketing
BA 31 24.08
31 (24.08)
k 13 (17.37) 2 16 (18.55) 60
(f  e )

BENG 2 8 (12.44) 16 (8.97) 7 (9.58) 31
 
BBA 31 24.08
12 (15.65)
i i
10 (11.29) 17 (12.06) 39
Other
31 24.08
10 (8.83)
i61
1
e 55 6.39
(6.39) 77 6.80
i
(6.80) 22
44 47 152
5 6.39 7 6.80
31 24.08
The expected frequency
5 6.39 7 6.80
31 24.08
5 6.39 7 6.80
(31 - 24.08)2 (5 - 6.39)2 (7 - 6.80)2
 2= 24.08 +….+ 6.39 +….+ 6.80
= 14.70
Contingency table  test – 2

Example
• Solution – continued
– The critical value in our example is:
 2 ,( r 1)( c 1)   .205,( 4 1)( 31)  12.5916
• Conclusion:
Since 2 = 14.70 > 12.5916, there
is sufficient evidence to infer at 5% significance

level that students’ undergraduate degree


and MBA students courses selection
are dependent.
SPSS Output
Chi-Square Tests

Asymp. Sig. (2-


Value df sided)
Pearson Chi-Square
14.702a 6 .023

Likelihood Ratio
13.781 6 .032

Linear-by-Linear Association
2.003 1 .157

N of Valid Cases
152

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is
6.37.
Yates’ Correction for Continuity
Chi-square distribution is a continuous
distribution. Whenever the degrees of freedom
(in case of a 2x2 table), certain corrections for
continuity can be made
Required conditions –
the rule of five
 The test statistic used to perform the test
is only approximately Chi-squared
distributed.
 For the approximation to apply, the
expected cell frequency has to be at least
5 for all the cells (np  5).
 If the expected frequency in a cell is less
than 5, combine it with other cells.

You might also like