You are on page 1of 10

Items Description of Module

Subject Name Management


Paper Name Research Methodology
Module Title Non-parametric tests
Module ID Module 16
Pre-Requisites Understanding the nature of non-parametric tests
Objectives To study the one sample tests, two sample tests and k-sample tests
Keywords Non-parametric tests, Chi-squire,
Role Name Affiliation
Prof.Ipshita Bansal Department of Management
Principal Investigator Studies, BPSMV, Khanpur
Kalan, Sonipat

Co-Principal Investigator

Prof. S.P.Singh Department of Management


Paper Coordinator Studies, GKV, Haridwar

Prof. S.P.Singh Department of Management


Content Writer (CW) Studies, GKV, Haridwar

Content Reviewer (CR)


Language Editor (LE)

QUADRANT –I

1. Module 16: Non-parametric tests


2. Learning Outcome
3. Introduction
4. Advantages and Disadvantages of non-parametric tests
5. One-sample tests
6. Two sample tests
7. K-sample tests
8. Summary

1. Module 13: Multivariate analysis

2. Learning Outcome

After studying this module, you shall be able to


 Know the concept of non-parametric tests
 Comprehend the advantages and disadvantages of non-parametric tests
 Understand the one-sample tests
 Become aware of the two-sample tests
 Know the k-sample tests

3. Introduction
There are various situations where the populations under study are not normally distributed. The
data collected from these populations is extremely skewed. In such a situation, the option is to use
a Non-parametric test. Non-parametric statistical tests are hemmed in by fewer and less stringent
assumptions than parametric tests. They are particularly free of assumptions about the
characteristics about the characteristics of the distribution of population of research samples.
They are also called distribution free tests. A nonparametric statistical test does not specify terms
about the parameters of the population from which the sample was drawn.

4. Advantages and Disadvantages of Non-parametric tests


There are several advantages of non-parametric tests:
 They are applied to many situations as they do not have the rigid requirements of their
parametric counterparts, like the sample having been drawn from the population
following a normal distribution. A research can encounter an application where a numeric
observation is difficult to obtain but a rank value is not. For example, it is easy to obtain
the rank data on the preference of consumer for the various brands of toothpaste rather
than assigning a numerical value to them. By using ranks, it is possible to loosen the
assumptions about the populations.
 Non-parametric tests can usually be applied to the nominal and ordinal data that lack
exact or comparable numerical values. For example, the respondents may be asked a
question on their religion. This is a nominal scale data and can only be analyzed by non-
parametric methods.
 Non-parametric tests involve very simple computations compared to the corresponding
parametric tests.

However, the methods are not without their drawbacks:


A lot of information is wasted because the exact numerical data is reduced to a qualitative
form. For example, in one of the non-parametric tests like the sign test, a plus sign denote the
increase or the gain whereas a negative sign indicates decrease or loss. The quantity of the
gain or loss receives no consideration.
When the basic assumptions of parametric tests are valid, parametric tests are more powerful
than non-parametric methods. Therefore, there is more risk of accepting a false hypothesis
and thus committing a type II error.
Null hypothesis in a non-parametric test is loosely defined in comparison to the parametric
tests. Therefore, if the null hypothesis is rejected, a non-parametric test gives a less precise
conclusion as compared to the parametric test. For example, corresponding to the null
hypothesis that the means of the two populations are equal in the parametric test, the null
hypothesis in a non-parametric test is that the two populations have same probability
distribution. In such a situation, rejecting a null hypothesis under the parametric test would
imply that the means of the two populations are different whereas under a non-parametric
test, it means that the two population distributions are different but the specific form of the
difference between the two populations is not clearly defined.

5. Non-parametric tests
The non-parametric tests such as chi-square, run tests, sign test, the Mann-Whitney U test, and
the Wicoxon matched pair rank test and the Kruskal-Wallis test
• Chi-square
• Run test for randomness
• One sample sign test

• Two sample sign test


• Run est for randomness
• Two sample sign test
• Mann Whitney U Test
• Wilcoson Signed Rank Test for paired
sample
• Kruskal Wallis Test
Figure 1 Non parametric test

5.1 One sample sign test


One sample sign test is applicable where a sample is drawn from a population having a
continuous symmetrical distribution and known to be non-normal such that the probability
of having a sample value less than the mean value as well as the probability of having a
sample mean more than the mean value (p) is ½. It will be difficult to verify the symmetry of
the distribution when the sample size is small. Hence, based on the hypothesized population
mean, population median, each sample observation will be classified into either plus (+) or
minus (-) sign
Modified sample value, Yi = +, if Xi > µ
= -, if Xi < µ
Figure 2 Types of Sign Test (Adapted from slideshare.net)

Where Xi is the ith sample value, µ is he mean/median and Yi is the ith modified sample
observation (+ or -).
This mean that if a sample value is more than µ, then it is replaced by + sign; if a sample
value is less than µ, then it is replaced by – sign.

5.2 Chi-square Test


The data in the form of frequencies, percentages or proportion is needed for application of a
chi-square test. The discrete data is used in majority of the applications of chi-square (X2 ).
The continuous data reduced to categories and table can also be dealt with chi-square. The
important properties of the chi-square distribution are:
 The chi-square distribution is non- symmetric.
 The chi-squares are equal to or more than zero.
 The degrees of freedom determine the shape of chi-square distribution.

 There are various applications of a chi-square test. Some of them are:


 A chi-square test is for independence of variables and the goodness of fit as
well as for the equality of more than two population proportions.
The data in chi-square is often in terms of frequencies. The survey data may be on a nominal
or higher scale of measurement. If it is on a higher scale of measurement, it can always be
converted in two categories. The real life condition in business permit for the collection of
frequencies e.g., gender, marital status, job classification, age and income. Therefore, a chi-
square becomes a much sought after tool for analysis. The researcher has to decide what
statistical test is implied by the chi-square statistic in a particular situation. The common
principles of all the chi-square tests are:
 State both the null as well as the alternative hypothesis regarding a
population.
 State specifically a level of significance
 Compute the expected frequencies of the occurrence of certain events under
the assumption that the null hypothesis is true.
 Note the observed frequencies of the data falling in different cells.
 Compute the chi-square value given by the formula.
 Compute the sample value of the statistic with the critical value at a given
level of significance and make the decision.
 The test also investigates how well the distribution fits the data. Many a
times, the researcher assumes that the sample is drawn from a normal or any
other distribution fits a given data may be of some interest.

5.3 Kolmogorov-Smirnov test

Figure 3 Kolmogorov-Smirnov Test (Adapted from youtube.com)

Kolmogorov-Smirnov test (K-S) test is similar to the chi-square test to perform goodness of
fit of a specific set of data to an assumed distribution. This test is more power for small
samples whereas the chi-square test is suited for large samples.
In K-S test, the values of the observed cumulative distribution of the random variable in a
given situation are compared with the corresponding values of the theoretical cumulative
distribution of the random variable and their absolute differences are calculated. Then the
maximum of these absolute difference is treated as the calculated statistic of K-S test,
namely Dcal = max | OFi-EFi |
Where OFi is the observed cumulative probability for the ith value of the random variable,
X, EFi is the expected cumulative probability for the ith value of the random variable X,
from the theoretical distribution.
For a given significance level (α) and the sample size (n).
The Kolmogorov-Smirnov test is a one-tailed test. Hence, if the calculated value of D is
more than the theoretical value of D for a given significance level, the reject the null
hypothesis; otherwise, accept the hypothesis

5.4 Run test for randomness


A run is defined as a sequence which consists of repeated occurrence of a particular symbol
and the immediate predicting and succeeding symbols are different from the symbol in it or
no other symbol on either side. The objective is to test about the randomness of the
occurrence of runs at a given significance level, α.
Let N1 be the frequency of occurrence of a particular symbol in the whole stream of
symbols, n2 be the frequency of occurrence of another symbol in the whole stream of
symbols and r be the number of runs.
If n1 as well as n2 is less than or equal to 20, then the sample is treated as a small sample. If
either n1 or n2, or both is/are larger than 20, then the sample is treated as a larger sample.
The combination of hypotheses for the run test of randomness is
Ho: The occurrence of the runs in the given stream of symbols is random.
H1: The occurrence of the runs in the given stream of symbols is not random.
In case of small samples, the observed number of runs = r
From the table of values of Student’s t statistic (t), the smaller critical value, and the larger
critical value if exists for given combination of n1 and n2 at a significance level of 0.05, can
be obtained. If the observed r value is less than or equal to the smaller critical value, or more
than or equal to the larger critical value for the given combination of n1 and n2 at the given
significance level (α), then reject the null hypothesis; otherwise, accept the null hypothesis.
If n1 and n2 or both is/are more than 20, then the sample is treated as a large sample. Under
such situation, one can approximate the sampling distribution or r to normal distribution
with the following mean and variance:

Mean or r, µ =

Variance of, r σ2 =

The formula for the standard normal statistic to test the significance of r is as shown below:
Z=
It is a two tailed test. Hence, the values of table –Zα/2 and Zα/2 define the left critical value
and right critical value of r. If the calculated Z value is in between –Zα/2 and Zα/2, then
accept the null hypothesis, otherwise reject the null hypothesis.

5.5 Two samples sign test


The different sign tests for two samples are applied if the two samples have the same size.
When the two independent samples with different sizes are taken from two populations, sign
tests cannot be applied. Under this situation, two samples median test can be used.
The purpose of the median test is to examine whether the two independent samples are from
two different populations with the same median
Procedures for median test
 Pool the observations of the two samples.
 Find the median of the combined observations.
 Form frequency table for the pooled observation
 Let n1 be the size of the first sample, n2 e the size of the second sample, N
be the size of pooled observations (n1 + n2); and µ be the median of the
pooled observations.
If the pooled sample is small, compute the probability. If the sample is large compute
the chi-square statistic.
The computation of probability for comparison with the given significance level use
the following formula

If the calculated value of P is more than the given significance level of α, accept the
null hypothesis; otherwise, reject the null hypothesis.
Compute chi-square statistic using the following formula

If the calculated chi-square statistic is less than the table chi-square value at 1 degree
of freedom and the given significance level of α , accept the null hypothesis; other,
reject the null hypothesis.

5.6 Mann-Whitney U test


Mann-Whitney U test is an alternate to the two sample t-test. This test is based on the ranks
of the observations f the two samples put together. This test is more powerful, when
compared to the Sign tests. The alternate name for this test is Rank-Sum test.
Let there be two samples from two different populations. Let n1 be the size of the first; n2
be the size of the second sample; and N = n1+n2..
The observations of the two samples are pooled together and then their positional ranks are
obtained from lowest (1) to highest (N) from left to right, respectively. If the same rank is
assigned to more than one observation, then the average of the propositions of those
observations is assigned as the rank for the observations having the equal rank.
Let R1 be the sum of the ranks of the observations of the first sample; R2 be the sum of the
ranks of the observations of the second sample. This objective of this test is to examine
whether the two samples are drawn from different populations having the same distribution.
If the size of the second sample (n1) is more than 8, one can have normal approximation to
the given data with the following random variable:
U = n1 n2 +

Where µU is the mean, σ2 U is the variance. The formulas for mean (µU) and variance (σ2
U) of U are:

µU=
σU2 =

ZU =

The null hypothesis of this test is:


H0 : Two samples are drawn from different populations having the same distribution.
A suitable alternate hypothesis (H1) can be assumed as per the reality.

5.7 Kruskal-Wallis Test


The Kruskal-Wallis test is, in fact, a non-parametric counterpart t the one way ANOVAA.
The test is an extension of the Mann Whitney U test. Both methods require that the scale of
the measurement of a sample value should be at least ordinal.
The hypothesis to be tested in Kruskal-Wallis test is:
H0: The k population has identical probability distribution.
H1: At least two of the populations differ in locations.
The procedure for the test is listed below:
 Obtain random samples of size n1,….nk, from each of the k populations.
Therefore, the total sample size is n = n1 + n2 ……+nk.
 Pool all the samples and rank hem, with the lowest score receiving a rank of
1. Ties are to be treated in the usual fashion by assigning an average rank to
the tied positions.
 Let r1 = the total of the ranks from the ith sample.
The Kruskal-Wallis test uses the X2 to test the null hypothesis. The test statistic is given by:

H=

Which follows a X2 distribution with the k-1 degrees of freedom?


The null hypothesis is rejected, if the computed X2 is greater than the critical value of X2 at
the specified level of significance.

5.8 Wilcoxon Signed Rank Test for Paired Samples


There are instances when the sample data consists of paired observations. Examples of
paired samples include a study where husband and wife are matched or where subjects are
studied before and after experimentation or observations are taken on a variable for brother
and sister. The normality assumption is the basis of the application of t-distribution. In
certain cases the normality premise is not met. In those cases the non-parametric test is used.
The two sample sign test is an example of this kind. This limitation is taken care of by the
Wilcoxon matched pair signed rank test. The test gives more weightage to the matched pair
with a greater difference. The test therefore, incorporates and makes use of more
information than the sign test. This is, therefore, a more powerful test than the sign test.
The test follows the procedure as:``
Let di denote the difference in the score for the ith matched pair. Retain signs, but discard
any pair for which d = 0
Ignoring the signs of difference, rank all the di ’s from the lowest to highest. In case the
differences have the same numerical values, assign to them the mean of the ranks involved
in the tie.
To each rank, prefix the sign of the difference.
Compute the sum of the absolute value of the negative and the positive ranks to be denoted
as T- and T+, respectively.
Let T be the smaller of the two sums found in step iv.
When the number of the pairs of observation () for which the difference is not zero is greater
than 15, the T statistic follows an approximate normal distribution under the null hypothesis,
that the population differences are centered at 0. The mean uT and standard deviation σT of T
are given by:
µT = and σT =

The test statistic is given by:

Z=

For a given level of significance α, the absolute sample Z should be greater than the absolute
Z α/2 to reject the null hypothesis. For a one sided upper tail test, the null hypothesis is
rejected if sample Z is less than -Zα and for a one-sided lower tail test, the null hypothesis is
rejected if sample Z is less than -Zα.

Summary
There are various situations where the populations under study are not normally distributed.
The data collected from these populations is extremely skewed. In such a situation, the
option is to use a Non-parametric test. Non-parametric statistical tests are hemmed in by
fewer and less stringent assumptions than parametric tests. There are several advantages of a
non-parametric test. However, the methods are not without their drawbacks: The non-
parametric tests such as chi-square, run tests, sign test, the Mann-Whitney U test, and the
Wicoxon matched pair rank test and the Kruskal-Wallis test
One sample sign test is applicable where a sample is drawn from a population having a
continuous symmetrical distribution and known to be non-normal.
Kolmogorov-Smirnov test (K-S) test is similar to the chi-square test to perform goodness of
fit of a specific set of data to an assumed distribution.
A run is defined as a sequence which consists of repeated occurrence of a particular symbol
and the immediate predicting and succeeding symbols are different from the symbol in it or
no other symbol on either side.
The different sign tests for two samples are applied if the two samples have the same size.
Mann-Whitney U test is an alternate to the two sample t-test. This test is based on the ranks
of the observations f the two samples put together. This test is more powerful, when
compared to the Sign tests. The alternate name for this test is Rank-Sum test.
The Kruskal-Wallis test is, in fact, a non-parametric counterpart the one way ANOVAA.
The test is an extension of the Mann Whitney U test. Both methods require that the scale of
the measurement of a simple value should be at least ordinal.

You might also like