Professional Documents
Culture Documents
PROJECT
on Non-Parametric Test
Presented by:
BASTALINGUM Vianee (1210305)
JAGOO Girish (1217975)
NITHOO Lovena (1215230)
PAREANEN Krisen (1213234)
MARDIAPOULLE Marie Annielle Elodie Veldy (1212926)
Table of Contents
List of Figures & Tables ........................................................................................................................... 4
Acknowledgement .................................................................................................................................. 6
Abstract ................................................................................................................................................... 7
Non-Parametric Test ............................................................................................................................... 8
1.1 Definition .......................................................................................................................... 8
1.2 Types of Data .................................................................................................................... 8
1.3 Examples of Non-Parametric statistical tests ................................................................ 10
1.4 Types of hypotheses ....................................................................................................... 10
1.5 General steps to carry out a Non-Parametric Test ........................................................ 11
Normality test ....................................................................................................................................... 12
2.1 Shapiro-Wilk Test ........................................................................................................... 12
2.1.1 Formula to carry out Shapiro-Wilk Test.................................................................................... 13
2.1.2 Interpretation of Shapiro-Wilk Test Values .............................................................................. 13
2.1.3 Shapiro-Wilk Test in Excel ......................................................................................................... 14
Wilcoxon Signed rank test .................................................................................................................... 16
3.1 Introduction.................................................................................................................... 16
3.2 Steps to perform a Wilcoxon signed rank test ............................................................... 16
3.2.1 The general way........................................................................................................................ 16
3.2.2 For Paired data ......................................................................................................................... 16
3.2.3 For Single set of data ................................................................................................................ 17
3.3 Example of a Wilcoxon sign test: ................................................................................... 17
Sign test................................................................................................................................................. 21
4.1 Introduction.................................................................................................................... 21
4.2 Assumptions ................................................................................................................... 21
4.3 Characteristics of the sign test ....................................................................................... 21
4.3.1 Large Sample data .................................................................................................................... 22
4.3.2 Small Sample Data .................................................................................................................... 23
4.4 Step-wise implementation of Sign Test (Large Sample data) in Excel ........................... 24
4.5 Step-wise implementation of Sign Test (Small Sample data) in Excel ........................... 26
The Mann-Whitney U-test .................................................................................................................... 28
5.1 Introduction.................................................................................................................... 28
2
5.2 Assumptions ................................................................................................................... 28
5.3 Steps to perform the Mann-Whitney test ..................................................................... 29
5.3.1 Dealing with Ties:...................................................................................................................... 31
5.4 Example of a Mann-Whitney U-test: ............................................................................. 31
5.4.1 Steps to perform a Mann-Whitney U-test in Microsoft Excel 2010: ........................................ 31
The Friedman Test ................................................................................................................................ 38
6.1 Introduction: .................................................................................................................. 38
6.2 Step-wise implementation of Friedman Test on Excel .................................................. 39
The Kruskal-Wallis H-test ...................................................................................................................... 42
7.1 Introduction.................................................................................................................... 42
7.2 Description of samples ................................................................................................... 42
7.3 Description of the Kruskal-Wallis H-test ........................................................................ 43
7.4 Calculations and steps for Kruskal-Wallis H-test............................................................ 44
7.5 Implementing Kruskal-Wallis H-test in excel 2010 ........................................................ 45
Summary and Condition ....................................................................................................................... 53
Division of Work .................................................................................................................................... 54
Bibliography .......................................................................................................................................... 56
3
List of Figures & Tables
Non-Parametric Test .................................................................................................................. 8
Table 1.1 Types of Data .......................................................................................................... 8
Table 1.2 Example of Non-Parametric Test.......................................................................... 10
Normality test........................................................................................................................... 12
Figure 2.1(a) Data Input for Testing ..................................................................................... 14
Figure 2.1(b) Testing Data .................................................................................................... 15
Wilcoxon Signed rank test ....................................................................................................... 16
Figure 2.1(a) Data Input for Testing ..................................................................................... 17
Figure 3.1(b) Significance Level ............................................................................................ 17
Figure 3.1(c) Implementation of Data in Excel Sheet .......................................................... 18
Figure 3.1(d) Calculating Difference & Rank ........................................................................ 18
Figure 3.1(e) Rank of Positive & Negative differences Calculated....................................... 19
Figure 3.1(f) Z-Score Calculated ........................................................................................... 19
Figure 3.1(g) Interpretation & Conclusion ........................................................................... 20
Sign test .................................................................................................................................... 21
Figure 4.1 Large Sample Data............................................................................................... 22
Figure 4.2 Small Sample Data ............................................................................................... 23
Figure 4.3(a) Null & Alternate Hypothesis ........................................................................... 24
Figure 4.3(b) Sets of data Calculated & Ranked ................................................................... 24
Figure 4.3(c) Calculating Sum of Ranks ................................................................................ 25
Figure 4.3(d) Final Calculations & Conclusion ...................................................................... 25
Figure 4.2(a) Null & Alternative Hypothesis......................................................................... 26
Figure 4.2(b) Sets of data Calculated & Ranked ................................................................... 26
Figure 4.2(c) Calculating Sum of Ranks ................................................................................ 27
Figure 4.2(d) Displaying Criterions ....................................................................................... 27
Figure 4.2(e) Conclusion ....................................................................................................... 27
The Mann-Whitney U-test ....................................................................................................... 28
Figure 5.1(a) Null Hypothesis & Alternative Hypothesis ...................................................... 31
Figure 5.1(b) Significance ..................................................................................................... 31
Figure 5.1(c) Sample Data .................................................................................................... 32
4
Figure 5.1(d) Ranking Sample Data Using FX Function ........................................................ 34
Figure 5.1(e) Click & Drag to Rank ....................................................................................... 35
Figure 5.1(f) Calculations...................................................................................................... 36
Figure 5.1(g) Result & Conclusion ........................................................................................ 37
The Friedman Test ................................................................................................................... 38
Figure 6.1(a) Null & Alternative Hypothesis......................................................................... 39
Figure 6.1(b) Significance ..................................................................................................... 39
Figure 6.1(c) Ranking & Summation..................................................................................... 39
Figure 6.1(d) Finding Value of k & n ..................................................................................... 39
Figure 6.1(e) Degree of Freedom ......................................................................................... 40
Figure 6.1(f) Decision Rule ................................................................................................... 40
Figure 6.1(g) Calculation & Conclusion ................................................................................ 41
The Kruskal-Wallis H-test ....................................................................................................... 42
Figure 7.1(a) Sample Data .................................................................................................... 45
Figure 7.1(b) Combining all Sample Data ............................................................................. 46
Figure 7.1(c) Ranking data using fx function part 1 ............................................................. 47
Figure 7.1(d) Ranking data using fx function part 2 ............................................................. 48
Figure 7.1(e) Ranking data using fx function part 3 ............................................................. 49
Figure 7.1(f) Ranking data using fx function part 4 .............................................................. 49
Figure 7.1(g) Click & Drag ..................................................................................................... 50
Figure 7.1(h) Recopying Rank to original group & Calculating Sum .................................... 51
Figure 7.1(i) Result & Conclusion ......................................................................................... 52
5
Acknowledgement
We are highly indebted to Dr.Y.D.Tangman and Mr.R.Coonjobeeharry for their guidance and
constant supervision as well as for providing necessary information regarding the project and
eventually for their kind support in its completion.
We would also like to extend our heartfelt thanks to many people, friends and institutions for
their valuable contribution towards the achievement of our project and their helpful
suggestions for its improvement.
Moreover, we express our gratitude towards our parents for their kind co-operation and
encouragement.
We are equally thankful to the various sources from which materials and information have
been drawn.
Our thanks and appreciation also goes to us in putting forward our ideas and opinions to
make this assignment a success.
6
Abstract
This project involves understanding the different types of Non-Parametric Statistical test. The
goal of this project is to use mathematical concepts in order to explain and implement an
example of these different types of non-parametric test on Microsoft Excel 2010.
Throughout this project, it became clear to us that non-parametric test are used for
independent samples. Non-parametric test like Mann-Whitney test, Kruskal Wallis,
Wilcoxon-sign test and Friedman test will be discussed. Real data is used for the simulations
of the Excel sheets. A step by step procedure is explained in this report how each test is
carried out.
7
Chapter 1
Non-Parametric Test
1.1 Definition
Non-parametric or distribution free test is a statistical procedure whereby the data does not
match a normal distribution. The data used in non-parametric test is frequently of ordinal
data type, thus implying it does not depend on arithmetic properties. Consequently, all tests
involving the ranking of data are non-parametric and also no statement about the distribution
of data is made.
Discrete data are types of data that take only exact values.
Unlike for discrete data, continuous data can take any numerical
value.
For instance:
Continuous Data
1. The length of a pencil (it can be 7cm, 8.2cm, 9.56cm…).
2. The mass of a chocolate cake.
3. The time taken by a person to finish his/her homework.
8
Ordinal data deals with the ranking of items in order. It is the
comparison of two or more items, i.e., finding the difference between
the first, the second and the third and onwards. Ordinal data is also
non-parametric.
Ordinal Data
For example:
Interval data (also called integer) are types of data for which the
measurement can be calculated and the difference in sizes can be
compared.
1. Dates.
2. Temperature in Fahrenheit.
3. Years.
Ratio data is similar to interval data, but gives information with respect to
an absolute zero. It can be multiplied and divided.
9
1.3 Examples of Non-Parametric statistical tests
Mann-Whitney U-test.
Comparison of two independent samples.
Wilcoxon rank sum test.
10
1.5 General steps to carry out a Non-Parametric Test
3. The suitable statistic test is chosen. This is done by taking into account:
4. The test statistic is then calculated. For small sample, a method particular to a specific
statistical test is used. Then, for large samples, the data is approximated to a normal
distribution and the z-score is evaluated.
5. The value required to reject the null hypothesis is determined using the suitable table
for critical values for the specific statistic.
6. The obtained value is compared with the critical value. This enables us to find the
difference based on a specific significance level. Then, we can assert whether the null
hypothesis should be rejected or not. For instance, for a two-tailed hypothesis with
the null hypothesis is not rejected if
(ALejeune, 2010)
11
Chapter 2
Normality test
Non-parametric Test is carried out with non-normal set of data. In order to know if the
samples of data collected are not normally distributed, normality tests are performed.
Stating the null and alternative hypothesis:
The null and alternative hypotheses for this test are stated as follows:
It is acknowledged that for this test to be carried out properly, the sample size must be
between 3 and 2,000.
12
2.1.1 Formula to carry out Shapiro-Wilk Test
Where,
m= expected value
given that
s= is the number of values recorded
Q is called the covariance statics
13
2.1.3 Shapiro-Wilk Test in Excel
14
The test should be
rejected in order to carry
out a non-parametric test
All the data samples of the different non -parametric tests that will be discussed
further in the project have been carried out thro ugh the Shapiro Wilk Normality
Test. This is to confirm that the data samples are not normally distributed and
therefore the data can be used to perform a non -parametric test.
15
Chapter 3
(c) For ordered categorical data where a numerical scale is inappropriate but where it is
possible to rank the observations.
Step. 1 →State the null hypothesis, Ho and the alternate hypothesis, HA.
Step. 2 →State α.
Step. 3 →State decision rule.
2. Calculate each paired difference, di = xi − yi, where xi, yi are the pairs of observations.
3. Rank the dis, ignoring the signs (i.e. assign rank 1 to the smallest |di|, rank 2 to the next
etc.)
4. Label each rank with its sign, according to the sign of di.
5. Calculate W+, the sum of the ranks of the positive di, and W−, the sum of the ranks of the
negative di. (As a check the total, W+ + W−, should be equal to n(n+1)2, where n is the
number of pairs of observations in the sample).
16
3.2.3 For Single set of data
1. State the null hypothesis - the median value is equal to some value M.
2. Calculate the difference between each observation and the hypothesized median, di = xi −
M.
3. Rank the dis, ignoring the signs (i.e. assign rank 1 to the smallest |di|, rank 2 to the next
etc.)
4. Label each rank with its sign, according to the sign of di.
5. Calculate W+, the sum of the ranks of the positive dis, and W−, the sum of the ranks of the
negative dis. (As a check the total, W+ + W−, should be equal ton(n+1)2, where n is the
number of pairs of observations in the sample).
Under the null hypothesis, we would expect the distribution of the differences to be
approximately symmetric around zero and the distribution of positives and negatives to be
distributed at random among the ranks. Under this assumption, it is possible to work out the
exact probability of every possible outcome for W. To carry out the test, we therefore
proceed as follows:
7. Use tables of critical values for the Wilcoxon signed rank sum test to find the probability of
observing a value of W or more extreme. Most tables give both one-sided and two-sided p-
values. If not, double the one-sided p-value to obtain the two-sided p-value. This is an exact
test.
17
3. The sample data is implemented on an excel sheet as follows:
4. The difference of prices and the ranks are calculated. Each rank is then assigned to
its sign.
5.
Figure 3.1(d) Calculating Difference & Rank
18
6. The sum of rank of positive differences, ∑R+ and sum of rank of negative differences,
∑R- are calculated.
7.
Figure 3.1(e) Rank of Positive & Negative differences Calculated
19
9. The results are therefore interpreted and a conclusion is drawn out.
Hence, if z< -1.96 or z>1.96, the null hypothesis will be rejected, i.e. it will imply that there
is no increase in the Consumer price index. But, as the value of z=-0.364955325, which is
greater than -1.96 and less than 1.96, we therefore accept that there has been an increase in
the median difference of the Consumer Price Index.
(Corder)
20
Chapter 4
Sign test
4.1 Introduction
To test the hypothesis, “difference in medians”, the sign test can be used as hypothesis tester,
between the constant distributions of two random variables A and B. This non-parametric test
makes only some assumptions about the kind of distributions under test. This implies that it
has a large probability of being accepted as a suitable test, but may be lacking in statistical
power like the Wilcoxon Signed rank test or the T test.
4.2 Assumptions
Let for .
3. The values of Ai and Bi represent are ordered, so the comparisons "greater than", "less
than", and "equal to" are meaningful.
The sign test can be classified into 2 different sets of data, namely:
21
4.3.1 Large Sample data
Under sign test, large samples are defined as those samples which contain more than 25 raw
data.
22
4.3.2 Small Sample Data
23
4.4 Step-wise implementation of Sign Test (Large Sample data) in Excel
1. The null hypothesis, the alternate hypothesis and other criteria to be considered during
the test are defined. NOTE: These are random numbers generated within the range 0 -
10.
24
3. The sum of ranks is then evaluated
25
4.5 Step-wise implementation of Sign Test (Small Sample data) in Excel
26
3. Sum of ranks is calculated to know the value of N.
- If the p-value is greater than α, then the null hypothesis is accepted. Else, it is
rejected.
-
-
Figure 4.2(e) Conclusion
(Corder)
27
Chapter 5
5.2 Assumptions
The basic assumptions of a Mann-Whitney U-test are:
2. The two samples are independent of each other and so are the observations within
each sample.
3. The measurement scale is of ordinal type. Therefore, the observations can be arranged
in ranks.
The Mann-Whitney U-test is centered on the difference of each observation in the first
sample with each observation in the other sample. The total number of pairwise difference
that can be executed is:
If the median is alike for both samples, then each has an even possibility of being smaller
or larger than each , implying a probability of ½.
28
The number of times an from sample 1 is greater than a from sample 2 is calculated and
represented by . Likewise, the number of times an from sample 1 is smaller than a
from sample 2 is counted and represented by . Below the null hypothesis, and are
likely to be almost the same.
2. The significance level, related with the null hypothesis is stated. is normally
set at 5% and therefore the confidence level is 95%.
5. Below each a, the number of b, that is lesser than it, is recorded. This implies
Similarly, below each b, note down the number of as that is lesser than
it. This means .
Verify that:
8. The Mann-Whitney U-test statistical tables are used to evaluate the possibility of
finding a value of U or lower. If the test is:
10. The results are then interpreted and a conclusion is drawn out.
29
Furthermore, if the number of observations is greater than 20, a normal approximation
is used. For large samples, the z-score is calculated and the normal distribution table is used
to get the critical region of the z-scores.
The following formulas are used to evaluate the z-score of the Mann-Whitney U-test for
samples of greater quantity:
Where, : mean
: number of values from the first sample
: number of values from the second sample
3. The z-score for the normal distribution of data can then be calculated.
30
5.3.1 Dealing with Ties:
Sometimes, two or more observations can be likely the same. Then, U is calculated by
assigning half the tie to the A value and the other half to the B value. Hence, the normal
distribution is used with a modification in the standard deviation. Therefore, the formula is:
31
3. The sample data is implemented on an excel sheet as follows:
4. The rank of the sample data is calculated as shown below. If there are ties, that is, two
or more sample values are similar, then the sample data is given a rank equal to the
mean of the ranks that would else be given.
(i)
Click on Insert
function
32
(ii)
(iii)
33
(iv)
(v)
34
(vi)
(vii)
35
5. The calculations are as follows:
(i)
(ii)
36
6. The results are then interpreted and a conclusion is drawn out.
(Shier)
(Corder)
37
Chapter 6
The Friedman Test is a non-parametric test which was developed and implemented by
Milton Friedman. This type of test is used for the comparison of three or more dependent
samples. This way of testing involves ranking each row together, then considering the values
of ranks by columns. No assumptions are made when implementing this type of test. The
Friedman test may be used as an alternative to repeated measures of ANOVA, when the
assumption of normality is not met. Just like some of other non-parametric tests, the
Friedman Test uses ranks of data rather than their raw data to compute the test statistic. The
fact that no assumptions are made in this test, it is not as powerful as the ANOVA. If it is
equivalent to the sign test, then this implies that there are only two measures for this test.
n wine judges each rate k different wines. Are any wines ranked consistently higher or
lower than the others?
n wines are each rated by k different judges. Are the judges' ratings consistent with each
other?
n welders each use k welding torches, and the ensuing welds were rated on quality. Do
any of the torches produce consistently better or worse welds?
We use the Friedman test for one way repeated measure analysis of variance by ranks. This
way of testing resembles to that of Kruskal Wallis one way analysis of variance by ranks.
38
6.2 Step-wise implementation of Friedman Test on Excel
3. The samples are then ranked and the summation of ranks is carried out.
39
5. The Degree of Freedom, df, is then evaluated using the formula:
(a) if there is no ties from the ranks, then is evaluated using the following formula
40
The Test statistic Equation is broken
into parts in order to facilitate the
computation of the result.
(Corder)
41
Chapter 7
The Kruskal-Wallis H-test is a non-parametric statistical procedure for comparing more than
two samples that are independent. The parametric equivalent to this test is the one-way
analysis of variance (ANOVA).
Yet ANOVA is used for normally distributed data, but Kruskal Wallis can be perform
without the data being normally distributed
The H-test is a generalization of the Mann-Whitney test, a test for knowing whether the two
samples chosen are taken from the same population. The p value for both the Kruskal Wallis
and the Mann- Whitney test are equal.
However Kruskal Wallis unlike Mann- Whitney test is used in samples to evaluate their
degree of association.
Kruskal Wallis tests liable to be performed when the different samples of data meet the
following criteria:
similar distributions
data in each sample should be >5
same shape of distribution as the population
data can and must be ranked
independent
42
7.3 Description of the Kruskal-Wallis H-test
43
7.4 Calculations and steps for Kruskal-Wallis H-test
Also worth noted is the observation of the median of the population while doing the Kruskal-
Wallis test.
In order to perform the Kruskal-Wallis H-test, the first step is to combine all the
samples and perform a rank ordering on all the values.
Where,
N = Number of values obtained from every grouped samples,
= Summation of ranks taken from a particular sample and
Number of values from the equivalent sum of rank.
While performing the H-test, the degree of freedom which is written as df is determined
by means of the formula:
Where,
df stands for degrees of freedom
k for the number of groups or samples
44
7.5 Implementing Kruskal-Wallis H-test in excel 2010
Example:
An investigation is carried out using 3 different groups of person to see how
many glasses of alcoholic drinks they consume per day although having gone
through the alcoholism therapy.
45
Step 1:
All the sample data are combined into one group and ranked.
46
Data is ranked using RANK.AVG
function.
47
2nd statistical is selected in the
insert function window.
48
3rd RANK.AVG is selected and ok is
pressed.
49
Click and drag to rank all data.
50
Step 2:
Rank data values are copied back into the original group and sum of ranks are
calculated for each group.
51
Step 3:
(Corder)
52
Summary and Condition
Non-Parametric Test plays an important role in the field of statistics. In this project, we have
seen the different types of non-parametric test and its importance. We have shown the
mathematical procedures of the non-parametric tests namely the Wilcoxon Signed Ranked
test, Sign test, Mann Whitney test, Kruskal Wallis and Friedman test. Next, using our skills
in mathematics, a stepwise implementation of these non-parametric tests has also been done
on MS Excel 2010.
Throughout the project, we have realized that non parametric tests have allowed us to test for
not- normally distributed samples of data more easily.
53
Division of Work
Jagoo Kushal Mardiapoullé
Krisen Pareanen Vianee Bastalingum Lovena Nithoo
Girish Annielle
Report
Cover, Bibliography &
Done
References
Acknowledgement Done
Abstract Done
54
Krisen Pareanen Vianee Bastalingum Lovena Nithoo Jagoo Kushal Girish Mardiapoullé Annielle
Excel
Wilcoxon Ranked Sign Test Edited Done
Sign Test Edited Done
Mann-Whitney Done
Kruskal Wallis Done Edited
Friedman Edited Done
Built–up of Project Done Done
55
Bibliography
ALejeune. (2010). Understanding Satistics Data Types.
The Wilcoxon Rank Sum Test Done in Excel. (n.d.). Retrieved from Excelmasterseries:
http://blog.excelmasterseries.com/2010/09/wilcoxon-rank-sum-test-done-in-excel.html
56