You are on page 1of 14

BCom BRM Unit-III Hypothesis testing

CONCEPT AND WORKSHEET OF HYPOTHESIS TESTING

ABSTRACT:
Research design and formulation of hypothesis are two important parts in a research process. In
the process of research work the task of designing a research approach, research instruments,
sampling plan and information gathering methods plays vary crucial role.

Reference: All the materials are prepared by referring Business Statistics written by P. C. Tulsian
and B. Jhunjhunwala.

Sampling – Analysis of Variance – F-Test


Meaning of Analysis of Variance or “F-Test”
The analysis of variance or ‘F-Test’ is a technique developed by R.A. Fisher, to
test for the significance of the difference among more than two sample means
and to make inferences about whether such samples are drawn from
population having the same mean.
F-test is based on the ratio rather than the difference between variances. F-test
is obtained by taking the ratio of unbiased estimates of population variances as
follows:
̂
̂

Where, = Sample size of 1st population


̂ = Unbiased Estimated Variance of 1st population
= Sample Size of 2nd population
̂ = Unbiased Estimated Variance of 2nd population
Or
∑ ̅ ∑ ̅
∑ ̅
[Since ]
Tutorial Note: To keep the ratio larger than 1, the larger variance is placed in the
numerator. If the computed value of F is greater than the table value of F, we
reject H0 and conclude that the two populations do not have the same
variance. If the computed value of F is less than the table value of F, we accept
H0 and conclude that the two populations have the same variance.

Assumptions of Analysis of Variance or “F-test”


The analysis of variance of F-Test is based on the following assumptions:
1. Each sample is drawn randomly from a normal population and the
sample statistics tend to reflect the characteristics of the population.
2. The population from which the samples are drawn have same means and
variances i.e.

Uses of F-Test
F test is used –
1. For test of hypothesis of equality between two variances.
2. For test of hypothesis of equality amongst several sample means.

Properties of F-Test
1. Range – Range of values of F is from 0 to . The value of F can never be
negative since both terms of the F-ratio and squared values.

2. Shape – The shape of ‘F’ distribution cure depends upon the number of
degrees of freedom for the first term and that for the second term. In
general F curve is skewed to right.

3. Critical Value – For same probability value, critical value of F for the lower
area is reciprocal of F for the upper area with and interchanged.

Analysis of Variance
Analysis of variance is the ratio of 2 variances (i) between samples & (ii) within
samples. Its purpose is to find out the influence of different forces working on
them.
It is used for agricultural experiments, for natural sciences, for physical sciences.

Classification Model
There may be one way classification model or two way classification model.
One-way Classification Model
One way classification model is designed to study the effect of one factor in an
experiment. For example, influence of application of one or more types of
fertilizers may be considered on several pieces of land. It is designed to test the
null hypothesis that the arithmetic means of the population from which the k
samples are randomly drawn are equal to one another.

Practical Steps involved in one factor analysis of variance


Step-1: We set
Step-2: Calculate the mean of each sample i.e. ̅ ̅ ̅ and grand
average as follows:
̅ ̅ ̅
̅
Note: To simplify calculations one may add, subtract, multiply or divide
the given data by any figure. It will not affect the ultimate solution.
Step-3: Calculate the difference between means of various samples and
grand average.

Step-4: Square these differences and obtain their total i.e. ∑ ̅ ̅ for
each sample.

Step-5: Calculate the sum of squares between the samples (SSB) as follows:
SSB = ∑ ̅ ̿ ̅ ̿ ∑ ̅ ̿
Step-6: Calculate the difference between the various items in a sample
and the mean values of the respective samples.
Step-7: Square these differences and obtain their total for each sample i.e.
∑ ̅
Step-8: Calculate the sum of squares within the samples (SSW) as follows:
∑ ̅ ̿ ∑ ̅ ̿ ∑ ̅ ̿
Step-9: Prepare ANOVA table as follows:
Source of Sum of Degree of Mean Computed Table
variation squares freedom squares value of F value of F
Between
SSB c–1 MSB = F=
samples
Within
SSW n–c MSW =
Samples
Total n–1

Step-10: Compare the computed value of F with the table value of F for the
given degrees of freedom as a given critical level (generally we
take 5% level of significance) and interpret the same as follows:
Case Interpretation
(a) If the computed value of F The difference in the
is greater than the table value variances is significant and it
of F could not have arisen due to
fluctuation of random
sampling and hence we reject

(b) If the computed value of F The difference in the variance


is less than the table value of F is not significant and it could
have arisen due to
fluctuations of random
sampling and hence we
accept
Illustration-1
The following table gives the yields on 15 sample fields under three varieties of
seeds; (viz. A, B, C)
YIELDS
A B C
5 3 10
6 5 13
8 2 7
1 10 13
5 0 17

Test at 5% level of significance.


Illustration-2
The following table gives the yield on 15 sample fields under three variables of
seeds (viz. A, B, C);
YIELDS
A B C
95 93 100
96 98 103
98 92 97
91 100 103
95 90 107

Test at 5% level of significance.

Illustration-3
The following table gives the yield on 15 fields under three varieties of seeds
(viz. A, B, C);

YIELDS
A B C
9500 9300 10000
9600 9800 10300
9800 9200 9700
9100 10000 10300
9500 9000 10700

Test at 5% level of significance.


Practical steps involved in the preparation of ANOVA Table (i.e. Analysis of
Variance table) for one factor analysis of variance.
Step-1: We set
Step-2: Calculate the mean of each sample i.e. ∑ ∑ square the
observations and obtain their total for each sample, i.e.
∑ ∑

Note: To simplify the calculations, one may add, subtract, multiply or


divide the given data by any figure. It will not affect the ultimate
solution.
Step-3: Calculate Correction Factor ( ) as follows.

Step-4: Calculate total sum of squares (SST) as follows:


SST = Sum of squares of all the observations – Correction Factor
= (∑ ∑ )
Step-5: Calculate sum of squares between samples (SSB) as follows;

SSB = * +

∑ ∑
=* +
Step-6: Calculate sum of squares within samples (SSW) as follows:
SSW = SST – SSB

Step-7: Prepare the ANOVA table as follows:


ANOVA Table
Source of Sum of Degree of Mean Variance
variation squares freedom squares Ratio
Between
SSB c–1 MSB = F=
samples
Within
SSW n–c MSW =
Samples
Total SST n–1

Step-8: Compare the computed value of F with the table value of F for the
given degrees of freedom as a given critical level (generally we
take 5% level of significance) and interpret the same as follows:
Case Interpretation
(a) If the computed value of F The difference in the
is greater than the table value variances is significant and it
of F could not have arisen due to
fluctuation of random
sampling and hence we reject

(b) If the computed value of F The difference in the variance


is less than the table value of F is not significant and it could
have arisen due to
fluctuations of random
sampling and hence we
accept

Two Way Classification Model


Two way classification model is designed to study the effects of two factors
simultaneously in the same experiment.
Practical steps involved in the preparation of ANOVA table (i.e. Analysis of
variance table) for two factor analysis of variance
Step-1: We set H0: At least two of the population means are
unequal.
Step-2: Calculate sum of observations of each row and each column and
their grand total.
Step-3: Calculate correction factor ( ) as follows:

Note: N = r x c where, r = no. of columns, c = no. of rows

Step-4: Calculate sum of squares between columns (SSC) as follows:


SSC =

Step-5: Calculate sum of squares between Rows (SSR) as follows:


SSR =

Step-6: Calculate total sum of squares (SST) as follows:


SST = Sum of squares of all the observations – correction factor
= (∑ ∑ )
Step-7: Calculate sum of squares for the Residual / Error (SSE) as follows
SSE = SST – (SSC + SSR)
Step-8: Prepare the ANOVA table as follows:
Source of Sum of Degree of
Mean squares Variance Ratio
variation squares freedom

Between
SSC c–1 MSC = =
samples

Within
SSR R–1 MSR = =
Samples
Residual / (c – 1) MSE =
SSE
Error ( r – 1)
Total SST rc – 1
* Greater or Smaller variance out of MSC and MSE
**Greater or Smaller variance out of MSR and MSE.

Step-9: Compare the computed value of F with the table value of F for the
given degrees of freedom at a given critical level (generally we
take 5% level of significance) and interpret the same as follows:
Case Interpretation
(a) If the computed value of F The difference in the
is greater than the table value variances is significant and it
of F could not have arisen due to
fluctuation of random
sampling and hence we reject

(b) If the computed value of F The difference in the variance


is less than the table value of F is not significant and it could
have arisen due to
fluctuations of random
sampling and hence we
accept

Illustration-5
The following table gives per hectare yield for three varieties of wheat each
grown on five plots:
Per hectare yield (in tons)
Plot of Land Variety of wheat
A B C
1 5 3 10
2 6 5 13
3 8 2 7
4 1 10 13
5 5 0 17
Test at 5% level of significance.

List of Formulae
1. Value of f ̂
̂

Where, = Sample size of 1st population


̂ = Unbiased Estimated Variance of 1st
population
= Sample Size of 2nd population
̂ = Unbiased Estimated Variance of 2nd
population
Or
∑ ̅ ∑ ̅
∑ ̅
[Since ]

2. Anova
Table for One Source of Sum of
Degree
Mean Variance
Factor of
variation squares squares Ratio
freedom
Analysis of
Variance Between
SSB c–1 MSB = F=
samples

Within MSW =
SSW n–c
Samples
Total SST n–1

3. Anova Table for two factor analysis of variance


Source of Sum of Degree of
Mean squares Variance Ratio
variation squares freedom
Between
SSC c–1 MSC = =
samples
Within
SSR R–1 MSR = =
Samples
Residual / (c – 1) MSE =
SSE
Error ( r – 1)
Total SST rc – 1
* Greater or Smaller variance out of MSC and MSE
**Greater or Smaller variance out of MSR and MSE.
NON-PARAMETRIC TESTS
Meaning of Non-Parametric Test
Non-parametric level or distribution free test do not rely on assumptions that the
data are drawn from a given probability distribution. As such it is the opposite of
parametric statistic.

The hypothesis of non-parametric test are concerned with something other than
the value of the population parameter. Hence non-parametric test does not
depend upon the fact that whether observed population fit into any parametric
distribution.
A non-parametric tests make only very fewer assumptions and as such they
have wide acceptability.
Non-parametric test are very simple to use. In certain cases, even when the use
of parametric test is justified, non-parametric test may be easier to use.
Advantages of Non-Parametric Test
1) It is a distribution free test.
2) It is more robust.
3) Non-parametric test can be used for very small sample size.
4) Non-parametric test can be used for attributes.
5) Non-parametric test can be used for making judgment about individuals.
6) They are very easy to calculate.
7) They can be used with limited information.

Dis-advantages of Non-parametric Test


1) The result cannot be generalized since they are not efficient as
parametric test.
2) It cannot be used for more complex problems.
3) It ignore certain amount of information.

Types of Non-Parametric Tests


a) Sign test for paired data
b) Spearman’s rank correlation test
c) Mann-Whitney U test
d) Krus Kal – Wallis Test
e) Wald – Wolfowitz runs test
f) Anderson – Darling test
g) Cliff’s delta
h) Cochran’s Q
i) Cohen’s Kappa
j) Efron – Petrosian test
k) Kolmogorov – Smirnov test
l) Median test
m) Pitman’s permutation test

Sign Test for Paired Data


In paired sign test there are two samples.
Differences are taken among the observations of two samples and only signs
are considered for analysis and magnitude. Signs may be positive or negative. If
difference is zero then it is ignored. Then test is conducted with the Null
hypothesis that both samples are taken from same population. This hypothesis
would be true if the number of positive signs are equal to the number of
negative signs. Alternate hypothesis is that the samples were not taken from the
same population. If null hypothesis is accepted then alternative hypothesis is
rejected and vice versa.
This method may be demonstrated through the illustration.
Illustration-1
In a beauty contest there are two judges who have to rate 12 contestants. The
rating have a score from 1 to 5. The score given by the judges are as follows:
Contestant Judge I Judge II
1 2 3
2 1 2
3 4 2
4 4 3
5 3 4
6 3 2
7 4 2
8 2 1
9 4 3
10 1 1
11 3 3
12 3 3

Whether both the judges have rated contestants in a same manner or they
differ if the significance level is 0.05?
Illustration-2
Use the sign test to see if there is a difference between the number of day’s until
collection of an account receivable, before and after a new collection policy.
Take
Before 30 28 34 35 40 42 33 38
After 32 29 33 32 37 43 40 41
Before 34 45 28 27 25 41 36
After 37 44 27 33 30 38 36

Illustration-3
The following data represent the rate of defective work of A to L workers before
and after takeover of a company.
A B C D E F G H I J K L
Before 8 7 6 9 7 10 8 6 5 8 10 8
After 6 5 8 6 9 8 10 7 5 6 9 8

On the basis of a paired sign test (using 0.10 level of significance) state whether
the tasks over has made any change.

Spearman’s Rank Correlation Test


If a rank is assigned to the each items contained in two variables that are
subject matter of study then a rank correlation coefficient can be calculated.
The rank correlation coefficient is a measure of correlation that exists between
two sets of ranks, a measure of degree of association between two variables
that could not be calculated otherwise,
Coefficient of Rank correlation

The correlation coefficient of 1 shows the perfect correlation between two


variables. Similarly, the correlation coefficient of -1 shows the perfect inverse
correlation.
When this test is performed then it is proceeded with logic that no association
exists between two variables. Hence null hypothesis is that there exist no
association between the two variables.
Therefore Null hypothesis:
Popuation rank co-efficient
Alternative hypothesis is , There is no correlation
, There is correlation
When , (i.e. for small number of paired observations) normal distribution is
not appropriate. In this case, it will not be appropriate to use t-distribution test.
Hence when n < 30 in rank correlation test then Spearman’s rank correlation
values are used to determine the acceptance and rejection of the hypothesis.

Notes:
1) Values for Spearman’s Rank correlation is given for both the tails
combined. Hence at 0.10 significance level, it shows the value at right tail
and left tail covering 0.05 area.
2) When there are extreme values in the original data then rank correlation
can products more useful result than correlation method.
When, , r can be normally distributed with the following standard
error.
Standard error of co-efficient of rank correlation ( )

After calculating standard error, results can be standardize by using
formula = Z =
Then the computed value Z can be compared with critical value of Z to
check the hypothesis.

Illustration-4
Determine whether the student who score well in Mathematics also score well in
Physics (
Student Rank in Mathematics Rank in Physics
A 3 6
B 2 2
C 1 3
D 5 4
E 4 1
F 6 5

Illustration-5
The manager (training) of a marketing company wanted to assess the
performance of ten salesman. For this he compared their rank in the training
program with their rank in the field. The results of his evaluation are given below.
What are the comments about the relationship / association between training
and performance (use 5% level of significance).

Illustration-6
Given below are the marks obtained by 11 students in a test. Find out rank
correlation and test at 5% level of significance, whether there is any correlation
between the scores in two subjects.
Illustration-7
A teacher in a school believes that students who finish exams more quickly than
others have better exam scores. The following set of data shows the score and
order to finish for 12 students on an exam. Using rank correlation method.Do
these data indicate that the first students to complete an exam have higher
grades? (
Mann-Whitney U Test
This test is used to determine that whether the two population have the same
mean. This is non-parametric level to test the hypothesis as against the
parametric test for testing the hypothesis discussed in earlier chapter. However,
Mann-Whitney U test (or simply U test) is restricted to only two populations.

The following variables are calculated for U test,

U Statistic = * +

[ ]

Mean of the sampling distribution of

Standard error of the U Statistics = √

If both and are larger than 10 than U statistics can be approximately by the
normal distribution and U statistics can be standardized as follows:

If computed value of Z is less than critical value of Z for corresponding then null
hypothesis is accepted (i.e. both the population have the same mean)
otherwise null hypothesis is rejected.

Illustration 8
A larger hospital hires most of its doctors from the two major universities. Over the
last year, hospital has been conducting test for the newly recruited doctors to
determine which school educate better. Based on the following scores, help the
human resource department of the hospital to decide whether the universities
differ in quality. (
Test Score
University 99 83 89 64 98 85 61 79 91 87 88
A
University 96 90 97 94 86 95 68 78 93 56 76 84
B
SUMMARY:
1. Research design is a comprehensive aspersions plan of the sequence of operations that a
researcher intends to carry out to achieve the research objectives. It involves selecting the
most appropriate methods and techniques to solve the problem under investigation.
2. A hypothesis is a tentative generalization the validity of which remains to be seen. In its
most elementary stage the hypothesis may be any hunch, guess, imaginative idea which
became the basis of further investigation. It analyses: how powerful is my study (test)? how
many observations do I need to have for what I want to get from the study? Answer to all
these above questions enabling researchers to efficiently use research resources.
3. Tests of hypothesis can be carried out on one or two samples. One sample tests are used to
test if the population parameter is different from a specified value. Two sample tests are
used to detect the difference between the parameters of two populations.

You might also like