You are on page 1of 35

09 Non-Parametric Tests

STRUCTURE

9.1 Chi-square Test: The Test of Association


9.2 Comparing Two Independent Conditions: Wilcoxon Rank-Sum Test and
Mann-Whitney Test
9.3 Kruskal-Wallis Test
9.4 Friedman ANOVA
9.5 Binomial test
9.6 Run Test
9.7 Summary
9.8 Exercises

OBJECTIVES

 Apply and analyze the chi-square test


 Apply Wilcoxon rank-sum test and Mann-Whitney test
 Differentiate between parametric and non-parametric test
 Illustrate Kruskal-Wallis test
 Apply and analyze Friedman ANOVA

I n the previous chapter, we have studied the concept of ANOVA. Let us now move forward and study non-
parametric tests. A parametric test is chosen to analyze a dataset where scale variables are normally
distributed with homogeneous variance for different groups having independent responses collected using
random sampling. In addition, a parametric test is based on a set of assumptions. It helps in drawing more
accurate and generalized conclusions. On the other hand, a non-parametric test is not followed by
assumptions. It is mostly applicable in ordinal and nominal data types. Non-parametric tests are simple to
apply and understand.
Non-parametric tests are sometimes known as assumption-free tests or distribution-free tests because they
make fewer assumptions about the type of data on which they are used. Non-parametric tests are useful in
analyzing data with categorical variables and small samples, where parametric tests are restricted because of
strict assumptions, such as normal distribution and homogeneity of variance. Most non-parametric tests work
on the frequencies of sub-categories and the ranking of data.

In research studies, when we analyze categorical variables, their frequencies are useful for the
analysis. In addition to this, the ranking of scores also provides meaningful information. In the
ranking process, the lowest score is provided the rank of 1, then finding the next highest score
and giving it a rank of 2, and so on. This process results in high scores being represented by large

Copy Right: Dr Ajay Kumar Chauhan, IMT Ghaziabad


2  Chapter 9

ranks and low scores being represented by small ranks. Then, the analysis is carried out on ranks
rather than the actual data.
The non-parametric tests to be discussed in the chapter include the chi-square test, Wilcoxon
rank-um test, Mann Whitney test, Kruskal-Wallis test, and Friedman ANOVA.

9.1 Chi-square Test: The Test of Association


Chi-square is one of the most popular non-parametric tests. It is used in two cases, which are as
follows:
 To test the association between two nominal variables in research
 To test the difference between the expected and observed frequencies of an event

The process of chi-square test compares the actual observed frequencies with the calculated
expected frequencies of different combinations of nominal variables. The difference between
observed and expected frequencies gives logic of possible association between categorical
variables. The chi-squared statistic compares the observed count in each table cell to the count
that would be expected between the row and column classifications under the assumption of no
association. A negligible difference between observed and expected frequencies may indicate no
association, whereas a big difference may indicate the possibility of association.

The null hypothesis of chi-square is that ‘There exists no significant association between two
nominal variables.’

9.1.1 Assumptions of Chi-square Test


Chi-square is a non-parametric test that does not rely on assumptions, such as having
continuously normally distributed data (categorical data cannot be normally distributed because it
is not continuous). However, the chi-square test has the following important assumptions:
1. Both the variables should be nominal in nature and must have at least two different sub
categories. For example, the categorical variable ‘Gender’ has two sub categories: males and
females.
2. Each observation contributes to only one cell of the contingency table.
3. In the 2 x 2 contingency table, the expected frequencies in each cellase should be greater
than five. In larger contingency tables, the rule is that all expected counts should be greater
than 1 and not more than 20 percent of the expected counts should be less than 5.
Example 9.1: Table 9.2 has the data collected from 100 Internet users. The data consists of two
nominal variables ‘Level of Familiarity with the Internet’ and ‘Education Background.’ The
details of the codes provided to different sub-categories of these nominal variables are shown in
Table 9.1. The objective of the analysis is to test the presence of an association between the
education background and the level of familiarity with the Internet. Hence, the chi-square test
should be applied here.

Table 9.1: Codes Provided to Sub-categories


Codes for the variable ‘Level of Familiarity Codes for the variable ‘Education
with the Internet” Background’
1 = Low Familiarity 1 = Humanities
Non-Parametric Tests  3

2 = Medium Familiarity 2 = Management


3 = High Familiarity 3 = Technology
4 = IT

Table 9.2: Data for Example

Education Background

Education Background

Education Background
Level of Familiarity

Level of Familiarity

Level of Familiarity
with the Internet

with the Internet

with the Internet


S. No.

S. No.

S. No.
1 3 1 35 2 2 69 1 2
2 2 3 36 1 2 70 3 1
3 3 1 37 2 1 71 2 2
4 3 1 38 2 4 72 2 3
5 3 4 39 1 1 73 1 1
6 3 4 40 2 3 74 2 2
7 3 1 41 2 2 75 2 2
8 3 1 42 1 1 76 2 1
9 3 1 43 2 3 77 1 1
10 3 3 44 2 4 78 2 3
11 2 1 45 2 2 79 1 2
12 1 1 46 3 1 80 1 1
13 3 1 47 3 3 81 1 1
14 3 1 48 2 2 82 1 3
15 3 3 49 3 2 83 1 1
16 2 4 50 2 2 84 1 1
17 2 2 51 1 2 85 1 2
18 2 4 52 2 2 86 1 1
19 2 2 53 1 4 87 2 2
20 2 4 54 3 2 88 1 1
21 3 1 55 2 2 89 2 4
22 3 1 56 2 4 90 2 1
4  Chapter 9

Table 9.2: Data for Example

Education Background

Education Background

Education Background
Level of Familiarity

Level of Familiarity

Level of Familiarity
with the Internet

with the Internet

with the Internet


S. No.

S. No.

S. No.
23 3 4 57 1 3 91 2 3
24 3 1 58 1 4 92 1 4
25 3 2 59 3 4 93 1 1
26 3 2 60 3 1 94 2 3
27 3 4 61 1 2 95 1 1
28 3 3 62 1 2 96 1 3
29 2 2 63 2 2 97 1 2
30 3 1 64 1 2 98 1 1
31 1 3 65 2 2 99 1 1
32 3 2 66 1 2 100 1 3
33 2 4 67 2 2
34 3 2 68 2 3

SPSS Analysis with Interpretation


Step 1: Click ‘Analyze’  ‘Descriptive Statistics’  ‘Crosstabs’
The same is shown in Figure 9.1:
Non-Parametric Tests  5

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.1: SPSS Command for Crosstabs (1)

Step 2: Transfer ‘Education Background’ to the ‘Row(s)’ window and ‘Familiarity with the
Internet’ to the ‘Column(s)’ window. Click ‘Statistics.’ The same is shown in Figure 9.2:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.2: SPSS Command for Crosstabs (2)

Step 3: Click ‘Continue.’ The same is shown in Figure 9.3:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.3: SPSS Command for Crosstabs (3)

Step 4: Click on ‘Round cell counts’ and select ‘Observed’ and ‘Expected.’ Click ‘Continue.’
The same is shown in Figure 9.4:
6  Chapter 9

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.4: SPSS Command for Crosstabs (4)

Step 5: Finally, select ‘OK.’ The chi-square test results will appear. The output of the chi-square
test with the interpretation is given in Table 9.3:

Table 9.3: SPSS Output: Cross-tabulation contingency table


Education Background * Familiarity with the Internet Cross Tabulation
Familiarity with the Internet Total
Low Medium High
Education Humanities Count 15 4 15 34
Background
Expected count 11.2 12.6 10.2 34.0
Management Count 10 17 6 33
Expected count 10.9 12.2 9.9 33.0
Technology Count 5 8 4 17
Expected count 5.6 6.3 5.1 17.0
IT Count 3 8 5 16
Expected count 5.3 5.9 4.8 16.0
Total Count 33 37 30 100
Expected count 33.0 37.0 30.0 100.0

9.1.2 Cross-Tabulation
Cross-tabulation in chi-square test is also known as a contingency table. This represents the
frequencies of various combinations of categories. There are two types of frequencies shown in a
contingency table: observed frequencies and expected frequencies. The observed frequencies are
calculated from the actual data collected in the research study, whereas the expected frequencies
can be calculated with the help of the following formula:
Non-Parametric Tests  7

Row Total i∗ColumnTotal j


Eij =
n
Where, n is the total number of observations.
Table 9.3 shows observed and estimated frequencies, which are different to each other. Hence,
there is a possibility of an association between two categories, which are shown in Table 9.4:

Table 9.4: Chi-squared Statistic


Value df Asymp. Sig. (2-sided)
Pearson’s chi-square 15.365a 6 .018
Likelihood ratio 17.239 6 .008
Linear-by-linear association .166 1 .684
N of valid cases 100
a. The expected count is less than 5. The minimum expected count is 4.80.

Table 9.4 represents various measures of chi-squared statistics. The details of these statistics are
discussed in the next section.

9.1.3 Different Measures of Chi-squared Statistics


The different measures of chi-squared statistics are discussed as follows:
 Pearson’s chi-squared test: It is one of the widely used chi-square tests. This statistics
compares the observed frequencies in a contingency table to the expected frequencies of
those categories. The Pearson’s chi-squared statistic (χ2) is mathematically expressed as
follows:
n 2
(Oi−Ei )
χ =∑
2

i=1 Ei
 Where, O represents the observed frequency for each category and E represents the expected
frequency of that category. E can be calculated as follows:
Row Total i∗ColumnTotal j
 Eij =
n
 Where, n is the total number of observations.
The calculated value of Pearson’s statistic is to be compared with the table value of chi-squared w.r.t. the
critical value at 5 percent or 1 percent level of significance at degrees of freedom (r-1) (c-1). Here, r
represents the number of rows and c represents the number of columns. If the calculated value of chi-
square is greater than the critical value (or p-value less than 5 percent), we can reject the null hypothesis
of no association between the categories and can accept the alternate hypothesis of a significant
association.

 As shown in Table 9.4, the Pearson’s chi-squared statistic is found to be 15.365. The p-value
of the Pearson’s chi-squared statistic is 0.018, which is less than 5 percent level of
significance. This indicate that with 95 percent confidence level, the null hypothesis of no
association between education background and level of familiarity with the Internet cannot
be accepted. Thus, it can be concluded that there exists a significant association between
them.
8  Chapter 9

 Likelihood ratio: This statistic is an alternative to the Pearson’s chi-square test and based
on the theory of maximum likelihood. The general idea behind this theory is that we collect
some data and create a model for which the probability of obtaining the observed set of data
is maximized and then, compare this model to the probability of obtaining that data under
the null hypothesis. The resulting statistics is, therefore, based on comparing the observed
frequency with those predicted by the model:
n
O
 L X 2=2 ∑ ln( E ij )
i , j=1 ij
2
 As with Pearson χ , this statistic has a chi-squared distribution with the same df and is tested
in the same way.
 Yates’s correction: Pearson’s chi-squared statistic works on an assumption of
approximation under which a researcher assumes that the discrete probability of observed
binomial frequencies is approximated by the continuous chi-squared distribution. No doubt,
this assumption results into some error. Frank Yates considered this element of error and
tried to reduce it. In case of a 2 x 2 contingency table, the Pearson’s chi-square test tends to
produce significant values that are too small (i.e., tend to increase the probability of making
type I error). Therefore, Yates suggested a correction to the Pearson’s formula, mentioned
below:
2
( O ij−E ij −0.5 )
X =∑
2

E ij
 This revised formula makes the value of χ2 statistics lower and hence less significant.
Table 9.5 indicates the strength of association between two categorical variables under
study:

Table 9.5: Strength of Association


Symmetric Measures
Value Approx. Sig.
Nominal by nominal Phi .392 .018
Cramer's V .277 .018
N of valid cases 100

 Strength of association is measured by Phi Statistic and Cramer’s V Statistic, which are
discussed as follows:
 Phi statistic: This concept, introduced by Karl Pearson, is similar to the Pearson
correlation coefficient and denoted by φ. Phi is used with 2 x 2 contingency tables. It
can be estimated by using the following formula:


Phi=
√ Chi−square Value
Sample ¿ ¿ ¿
¿

Cramer’s V statistic: This concept is based on Pearson’s chi-squared statistics and was
published by Harald Cramer. It is denoted by φc and provides a measure of strength of
association between two nominal variables. It results in a value between 0 to 1, where 0
denotes ‘no association’ and 1 denotes ‘complete association’.
Non-Parametric Tests  9

 Preference of Cramer’s V over Phi: Phi fails to reach its minimum value of zero
(indicating no association) when one of the two categorical variables contains more than two
categories. In this kind of circumstance, Cramer’s V is preferred over Phi.
In Table 9.5, the Cramer’s V statistic is 0.277, which indicates the medium strength in the
association between two variables under study.
Two of the widely used methods that measure the strength of association of cross- tabulated data
when both variables are measured at the ordinal level are discussed as follows:
 Goodman and Kruskal’s : This measures the proportional reduction in error that is
achieved when the membership of a category of one variable is used to predict category
membership on the other variable. A value of 1 means that one variable perfectly predicts
the other, whereas a value of zero indicates that one variable in no way predicts the other.
 Odd ratio: It helps in measuring the intensity of the association of the presence and absence
of one condition with the presence or absence of the second condition in a given population.
In chi-square, the odd ratio measures the effect size for categorical data. This is useful in 2 x 2
contingency tables because the interpretation of the odd ratio is very clear in these contingency
tables.

9.2 Comparing Two Independent Conditions: Wilcoxon Rank-Sum


Test and Mann-Whitney Test
When we want to test differences between two conditions and different participants have been
used in each condition, we have two choices:
 Mann-Whitney test
 Wilcoxon rank-sum test
Example 9.2: The HR manager of an enterprise wants to conduct an experiment to investigate
the performance of employees as a result of certain motivational incentives. He tests 20
employees in all. He divided employees in two groups. In case of 10 employees in Group 1, there
is no change in any incentive scheme. However, other 10 employees are offered that specific
incentive related to their performance. The levels of performance of both the groups are
measured initially as well as after one month. The data is provided in
Table 9.6:

Table 9.6: Data for Analysis


Employee Incentive Initial Performance Performance after
One Month
1 15 28
2 35 35
3 16 35
4 18 24
5 No incentive (G1) 19 39
6 17 32
7 27 27
8 16 29
10  Chapter 9

Table 9.6: Data for Analysis


Employee Incentive Initial Performance Performance after
One Month
9 13 36
10 20 35
11 16 5
12 15 6
13 20 30
14 Offer Incentive (G2) 15 8
15 16 9
16 13 7
17 14 6
18 19 17
19 18 3
20 18 10

Initially it is assumed that there is no difference between the performance levels of employees in
both the groups. If we provide ranks to the employees on the basis of their performance scores
before conducting the experiment, the sum of ranks of both the groups should be same. However
after the different treatments of the different groups, if employees with an incentive offer perform
better as compared to other groups, we can expect higher performance score ranks to be in group
2 and the lower ranks to be in group 1. If we summed ranks in each group, we expect higher sum
of ranks in an incentive group than in no incentive group.
Now, Mann-Whitney and Wilcoxon rank-sum tests both work on this principle of difference in
ranking in two treatments.
When the number of participants in different groups is unequal, then the test statistic (Ws) for
Wilcoxon rank-sum test is simply the sum of ranks in the smaller group, whereas when the group
sizes are equal, it is the value of smaller summed rank.
Table 9.7 shows the ranking process for the pre- and post-performance data. First, arrange the
data (scores) in an ascending order, attach a level of the group it belongs to (here G1 for no
incentive group and G2 for incentive group), and then starting at the lowest score, assign
potential ranks starting with 1 and going up to the numbero. of scores available.
Sometimes, same score occurs more than once in a dataset. These are called tied ranks, and these
values need to be given a same rank. Thus, we assign a rank that is the average of potential rank
of these scores.
When we rank the data, we add up all of the ranks in two groups. So, add the ranks for the scores
that came from the no incentive group (59) and then add the ranks for the scores that came from
the incentive group (151). We take the lowest of these sums to be the test statistic; therefore, the
test statistic for post-performance data is Ws = 59.
With initial performance data, the sum of ranks for no incentive group is 90.5 and for the
incentive group, it is 119.5. We take the lowest of these sums to be our test statistic. Therefore,
the test statistics for the initial performance data is Ws = 90.5.
Non-Parametric Tests  11

Table 9.7: Employee Performance Data


Post-performance Data Pre-performance Data
Score Rank Actual Group Score Rank Actual Group
Rank Data
3 1 1 G1 13 1 1.5 G1
5 2 2 G1 13 2 1.5 G2
6 3 3.5 G1 14 3 3 G1
6 4 3.5 G1 15 4 5 G1
7 5 5 G1 15 5 5 G1
8 6 6 G1 15 6 5 G2
9 7 7 G1 16 7 8.5 G1
10 8 8 G1 16 8 8.5 G1
17 9 9 G1 16 9 8.5 G2
24 10 10 G2 16 10 8.5 G2
27 11 11 G2 17 11 11 G2
28 12 12 G2 18 12 13 G2
29 13 13 G2 18 13 13 G1
30 14 14 G1 18 14 13 G1
32 15 15 G2 19 15 15.5 G2
35 16 17 G2 19 16 15.5 G1
35 17 17 G2 20 17 17.5 G2
35 18 17 G2 20 18 17.5 G1
36 19 19 G2 27 19 19 G2
39 20 20 G2 35 20 20 G2
Sum of Ranks (G1) = 59 Sum of Ranks (G1) = 90.5
Sum of Ranks (G2) = 151 Sum of Ranks (G2) = 119.5

Now, let us check whether this test statistic is significant.


The mean (Ws) and SE of this test statistic (SE ws) can be easily calculated from the sample sizes
of each group (n1 is the sample size of group 1, and n2 is the sample size of group 2).
n 1(n 1+n 2+1)
W s=
2

SEws =

In the example, as n1 = n2 = 10, therefore:


√ n 1 n 2(n 1+n 2+1)
12
12  Chapter 9

10(10+10+1)
W s= = 105
2

SEws =
√ 10∗10(10+10+ 1)
12
= 13.23

The z score of the test statistic can be calculated as follows:


W s −W s
z=
SE ws
90.5−105
For example z (initial performance)= = -1.10
105
59−105
z ( post performance)= = -3.48
13.23
If these z-scores is greater than > 1.96, the test is significant at five percent level of significance
p < 0.05. So, there is a significant difference between groups in case of post-performance, not in
case of pre-performance.
The procedure described above is the Wilcoxon rank-sum test. The Mann-Whitney test is
basically the same. It is based on a test statistic U, which is calculated using an equation in which
n1 and n2 are the sample sizes of group 1 and 2, respectively, and R1 is the sum of the ranks for
group 1:
n 1(n 1+1)
U =n 1n 2+ −R 1
2

For given data,


10 ( 10+1 )
( pre− performance ) =10∗10+ −119.5 = 35.50
2
U ( post−performance )=4
Let us feed the data of variables in the example in SPSS:
 Variable 1 ‘Incentive’  Coding (1 for incentive, 2 for no incentive)
 Variable 2 ‘Initial-performance’  Scale Variable  Values of initial performance
 Variable 3 ‘Post-performance’  Values of post-performance
Initially, the normality of the data should be checked.
Now, let us run the analysis in SPSS.
Step 1: Click ‘Analyze’ ‘Nonparametric Tests’  ‘Legacy Dialogs’  ‘2 Independent
Samples’
The same is shown below in Figure 9.5:
Non-Parametric Tests  13

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.5: SPSS Command for Wilcoxon Rank-Sum Test (1)

Step 2: Transfer the variables ‘Pre_Performance’ and ‘Post_Performance’ to the ‘Test Variable
List’ window.
The same is shown in Figure 9.6:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.6: SPSS Command for Wilcoxon Rank-Sum Test (2)
14  Chapter 9

Step 3: Now, transfer the variable ‘Incentive’ to the ‘Grouping Variable’ window and define
groups by providing the numeric codes of different groups.
The same is shown in Figure 9.7:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.7: SPSS Command for Wilcoxons Rank Sum Test (3)

SPSS Output and Interpretation


The output provided by SPSS and its interpretation is explained in Table 9.8:

Table 9.8: Descriptive Statistics


N Mean Std. Minimum Maximum
Deviation
Pre-performance 20 18.0000 5.07833 13.00 35.00
Post-performance 20 21.0500 12.92275 3.00 39.00
Incentive 20 1.5000 .51299 1.00 2.00

Table 9.8 represents the descriptive statistics of variables. The result indicates that the average
pre-performance score is 18 with standard deviation of 5.07. Similarly, the average post-
performance score is 21.05 with standard deviation of 12.05. The average performance increases
after the experiment. The mean and standard deviation of the category variable incentive has no
statistical meaning. Table 9.9 represents the ranks of different subgroups in different cases:

Table 9.9: Ranksof Different Groups

Incentive N Mean Rank Sum of Ranks


Pre_Performance Incentive 10 11.95 119.50
No Incentive 10 9.05 90.50
Total 20
Post_Performance Incentive 10 15.10 151.00
Non-Parametric Tests  15

No Incentive 10 5.90 59.00


Total 20

These ranks are useful in calculating the Mann-Whitney and Wilcoxon statistics. The calculated
test statistics of Mann-Whitney U and Wilcoxon tests are shown in Table 9.10:

Table 9.10: Test Statistics of Mann-Whitney U and Wilcoxon Tests


Pre-Performance Post-Performance
Mann-Whitney U 35.500 4.000
Wilcoxon W 90.500 59.000
Z -1.105 -3.484
Asymp. Sig. (2-tailed) .269 .000
Exact Sig. [2*(1-tailed Sig.)] .280 .000

The null hypothesis of both the tests is that there is no difference in both the group performances.
In case of pre-performance, as the p-value of both the statistics is more than 5 percent level of
significance, the null hypothesis can be accepted. Hence, it can be concluded that the
performance of both the groups is same initially. However, in case of post-performance, the p-
value of both the statistics is less than 5 percent level of significance. Therefore, the null
hypothesis cannot be accepted, and it can be concluded that the post-performance of both the
groups is significantly different. In case of post-performance, as the sum of ranks for incentive
group is more than the sum of ranks of no incentive groups, it can be concluded that there exists a
significant impact of the incentive on employee performance.

9.3 Kruskal-Wallis Test


Kruskal Wallis test is another useful non-parametric test. It is used to test the difference between
more than two independent samples. This test is an alternative to one-way independent ANOVA
if the assumptions of a parametric test are violated. The Kruskal-Wallis test is based on the ranks
of scale variable for each independent group. The sum of the ranks for each sample is calculated
and based on this, the Kruskal-Wallis test statistic (H) is calculated as follows:
k
12 R2
H= ∑
N (N −1) i=0 ni
−3 (N +1)

Where,
R = Sum of ranks for each sample
N = Total sample size
ni = Sample size of a particular group
Example 9.3: A company has measured the quantity lity of product sold in four cities for six
months. The company wants to know if there are different sales in four cities. Table 9.11 shows
the raw data:
16  Chapter 9

Table 9.11: Gross Data


City 1 City 2 City 3 City 4
68 119 70 61
93 116 68 54
123 101 54 59
83 103 73 67
108 113 81 59
122 84 68 70

The calculation is shown in Table 9.12:

Table 9.12: Calculation of Kruskal-Wallis Test


City 1 Rank City 2 Rank City 3 Rank City 4 Rank
68 8 119 22 70 10.5 61 5
93 16 116 21 68 8 54 1.5
123 24 101 17 54 1.5 59 3.5
83 14 103 18 73 12 67 6
108 19 113 20 81 13 59 3.5
122 23 84 15 68 8 70 10.5

Sum 104 113 53 30


Mean 17.33 18.83 8.83 5

k
12 R2
H= ∑ −3 (N +1)
N (N −1) i=0 ni

[ ]
2 2 2 2
12 104 113 53 30
H= + + + −3∗25
24∗25 6 6 6 6
H = 15.98
Because there are tied ranks, H should be corrected by
p

∑ (t3−t)
C=1− i=1 3
N −N
1
[3 ( 2 −2 ) + ( 3 −3 ) ]
3 3
C=1− 3
24 −24
C = 0.997
The corrected H’ is
Non-Parametric Tests  17

' H 15.98
H= = =16.03
C 0.997
The critical value of chi-square at 5 percent level of significance and 3 degrees of freedom is
7.81. As the observed test value of H is greater than critical value of chi-square, it can be
concluded that there must be a significant difference in the amount of sales between four cities.

SPSS Output and Interpretation


The output provided by SPSS and its interpretation is explained as follows:
Step 1: Click ‘Analyze’  ‘Nonparametric Tests’  ‘Legacy Dialogs’  ‘K Independent
Samples’
The same is shown in Figure 9.8:

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.8: SPSS Command for Kruskal-Wallis Test (1)

Step 2: Transfer ‘Sales’ to the ‘Test Variable List’ window and City, to ‘Grouping Variable’.
Define groups 1 to 4. Click ‘Options’ and select ‘Descriptive.’ Finally, click ‘OK.’
The same is shown in Figure 9.9:
18  Chapter 9

(Copyright: IBM Corp. IBM SPSS Statistics for Windows, Version 21.0.)
 Figure 9.9: SPSS Command for Kruskal-Wallis Test (2)

Table 9.13 represents the descriptive statistics of variables:

Table 9.13: Descriptive Statistics


N Mean Std. Deviation Minimum Maximum
Sales 24 84.0417 23.33216 54.00 123.00
City 24 2.5000 1.14208 1.00 4.00

The mean sales of all the cities are 84.04 with standard deviation of 23.33. The mean and
standard deviation of a city is irrelevant. Table 9.14 represents mean ranks of different cities. The
mean rank of city 2 is found to be the highest and that of city 4 is found to be the lowest.

Table 9.14: Mean Ranks of Different Cities


City N Mean Rank
Sales 1.00 6 17.33
2.00 6 18.83
3.00 6 8.83
4.00 6 5.00
Total 24

Table 9.15 shows test statistics:

Table 9.15: Test Statistics


Sales
Chi-square 16.029
df 3
Asymp. Sig. .001
a. Kruskal Wallis Test
b. Grouping Variable: City
Non-Parametric Tests  19

The results indicate that the value of the Kruskal-Wallis test statistic is 16.02. The p-value of test
statistic is found to be less than 5 percent level of significance. This indicates that, the null
hypothesis of all mean ranks equal cannot be accepted. Therefore, it can be concluded that the
level of sales in different cities are significantly different.

9.4 Friedman ANOVA


The Friedman ANOVA test is named after its originator, the economist Milton Friedman.
Friedman ANOVA is a non-parametric (distribution-free) test used to test a difference in more
than two different conditions (treatments), and the same participants have been used in all
conditions. Friedman ANOVA is an alternative to one-way ANOVA with repeated measures
especially in the cases of violations of assumptions of the parametric tests by the data,. fFor
example, if data is not normally distributed or the sample size is small.
The Friedman ANOVA test is based on ranked data and not the actual scores. The data is
arranged in such a way that the column represents conditions and rows represent the score of
participants for different conditions. The ranking is done for each person (row wise). Rank 1 is
given to the lowest scores, and highest rank should be provided to the highest score. Then, the
sum of ranks for each condition (Ri) is calculated. Finally, the Friedman ANOVA test statistic on
the basis of the sum of ranks for each condition can be calculated using the following formula:
F r=¿
Where, Ri = Sum of ranks for each group
N = Total sample size
k = Number of conditions
The null and alternate hypotheses of Friedman ANOVA are mentioned as follows:
 Null hypothesis (Ho): There is no difference in mean ranks for repeated measures.
 Alternate hypothesis (H1): A significant difference exists in the mean ranks for repeated
measures.
The test statistic for the Friedman's ANOVA test is a chi-squared with (n-1) degrees of freedom,
where n is the number of repeated measures. When the p-value for chi-square test statistic is
small (usually less than 0.05), the null hypothesis of Friedman's ANOVA test can be rejected.
Example 9.4: A finance professor wants to measure the effectiveness of different teaching
methods for his finance students. The professor decided to analyze the feedback of students on
the different methods of teaching finance. The teaching methods selected for experiment are
blackboard, PowerPoint presentation, and case study. The feedback from students is collected
using the following questionnaire:
Dear students, you are requested to rate the session on the following parameters in the scale of 1
to 10, where 1 means very low and 10 means very high. Look at the following table:

Attributes Blackboard Power Point Case Study


presentation
Understanding of the concepts
Interesting session
Participation and involvement
Confidence on the topic
20  Chapter 9

Overall effectiveness

The responses of 15 students selected at random are as follows:

Student Feedback
Blackboard Power Point Presentation Case Study
1 37 32 38
2 32 28 39
3 36 32 40
4 28 30 35
5 39 32 42
6 34 30 40
7 30 24 36
8 38 30 40
9 39 28 40
10 40 30 42
11 26 20 36
12 28 29 36
13 36 30 40
14 30 30 38
15 27 24 32

Now, let us discuss the following:


1. Why Friedman ANOVA is suitable for the objective of the professor?
2. Which method is the most suitable according to the results?
3. How the results should be reported?

Calculation of Friedman ANOVA Statistics


First, ranks are estimated for each participant in such a way that rank 1 is provided the lowest
score and rank 3 is provided the highest score (Table 9.16). The sum of ranks for each category
is also calculated, as shown in Table 9.16:

Table 9.16: Sum of Ranks for Each Category


Student Feedback
Blackbo Rank Power Point Rank Case Rank
ard Presentation Study
1 37 2 32 1 38 3
2 32 2 28 1 39 3
3 36 2 32 1 40 3
Non-Parametric Tests  21

Table 9.16: Sum of Ranks for Each Category


Student Feedback
Blackbo Rank Power Point Rank Case Rank
ard Presentation Study
4 28 2 30 1 35 3
5 39 2 32 1 42 3
6 34 2 30 1 40 3
7 30 2 24 1 36 3
8 38 2 30 1 40 3
9 39 2 28 1 40 3
10 40 2 30 1 42 3
11 26 2 20 1 36 3
12 28 1 29 2 36 3
13 36 2 30 1 40 3
14 29 1 30 2 38 3
15 27 2 24 1 32 3
Sum of Ranks 28 17 45

The Friedman ANOVA test statistic can be calculated as follows:


F r=¿
F r=¿
Fr = 26.53
As the chi-squared statistic is higher than the critical value of chi-squared at 5 percent level of
significance and 2 degrees of freedom, the null hypothesis of equal mean that ranks for all the
condition cannot be accepted. Hence, it can be concluded that the perceived effectiveness of all
treatments are not the same. After analyzing mean ranks, it can be concluded that the case study
method is more effective according to students. This is followed by the blackboard method as too
much use of the board is required in teaching finance in the class. The PPT method is least
preferred by the students.

SPSS Output and Interpretation


Click ‘Analyze’  ‘Nonparametric Tests’  ‘K Related Samples’  Transfer all the three
variables in ‘Test Variables’  ‘Statistics’  ‘Descriptives’ and ‘Quartiles’  ‘Continue’ 
‘Friedman’  ‘OK.’
SPSS output, ranks, and test statistics are shown in Table 9.17, Table 9.18, and Table 9.19,
respectively:
22  Chapter 9

Table 9.17: Descriptive Statistics

Maximum
Minimum
Deviation

(Median)
Mean

25th

50th

75th
Std.
Blackboard N
15 33.2667 4.93481 26.00 40.00 28.0000 34.0000 38.0000
PPT 15 28.6000 3.41844 20.00 32.00 28.0000 30.0000 30.0000
Case study 15 38.2667 2.78944 32.00 42.00 36.0000 39.0000 40.0000

Table 9.18: Ranks


Mean Rank
Blackboard 1.80
PPT 1.20
Case study 3.00

Table 9.19: Test Statisticsa


N 15
Chi-square 25.200
df 2
Asymp. Sig. .000
a. Friedman Test

Interpretation
Table 9.17 represents descriptive analysis of three methods of teaching adopted by the professor.
The descriptive analysis includes the mean score, standard deviation, minimum score, maximum
score, first quartile, median, and the third quartile of three methods. The results indicate that the
case study method has the highest mean score followed by the blackboard and PPT methods. This
indicates that the case study method is more effective according to the students. This is followed
by the blackboard method as so much use of the board is required in teaching finance in the class.
The PPT method is least preferred by the students.
Table 9.18 represents the average ranks of the three methods.
Table 9.19 represents the chi-squared statistic along with its p-value. The results indicate that the
null hypothesis of equal ranks for each group cannot be accepted. Hence, it can be concluded
that the mean ranks of the groups are significantly different.
Non-Parametric Tests  23

9.6 Binomial Test

Binomial test is a non parametric test which can be used to test whether the observed
frequencies of two mutually exclusive sub categories of a dichotomous variable is same as
expected frequencies. It is assumed that the dichotomous variable follows binomial probability
distribution, where p represents the probability of first category as an outcome and 1 – p
represents the probability of second category as an outcome.

Example 1. Assuming that the probability that the stock market move up or move down on any
day is 50 percent. In order to test this assumption that data of NIFTY (stock index) movement is
collected for continous 30 trading days. The data is given below in table 9.20. In the data, 0
represents down movement and 1 represents up movement.

Table 9.20: Data for Binaomial test

Da Moveme Da Moveme Da Moveme


y nt y nt y nt
1 0 11 0 21 1
2 1 12 1 22 1
3 1 13 1 23 0
4 0 14 0 24 1
5 1 15 1 25 1
6 0 16 1 26 0
7 0 17 0 27 0
8 1 18 1 28 0
9 1 19 0 29 1
10 0 20 1 30 0

Can we say the that the proportion of up movement is significantly different from 50 percent?

SPSS Output and Interpretation

The procedure of applying Binomial test in SPSS is described below:


Step 1: Click ‘Analyze’  ‘Nonparametric Tests’  ‘Legacy Dialogs’  ‘Binomial”
24  Chapter 9

Step 2: Transfer the dichotomous variable in ‘Test Variable List’  Press ‘OK’

SPSS output of the Binomial test is given below in table 9.21

Table 9.21: Output of Binomial Test


Categor N Observed Test Exact Sig.
y Prop. Prop. (2-tailed)
Group 1 .00 14 .47 .50 .856
Index
Group 2 1.00 16 .53
Movement
Total 30 1.00

Interpretation of the output

The null hypothesis of the binomial test is “There exists no significant difference between the
expected and observed proportion of up movement of NIFTY index”. Since the p value of the
Non-Parametric Tests  25

binomial test is found to be 0.856, null hypothesis with 95 percent confidence level can be
accepted. Thus it can be concluded from the results that the probability of stock index to move
up on any day is 50 percent.

9.7 Run Test

Another non parametric test is Run test which can be uesd to test the presence of
randomness in the observations of the variable of a sample. Randomness in the observations of
the variable is necessary sometimes as it improves the accuracy of the results. In research most
of the time the researcher prefers to apply random sampling method. In such as case it is
assumed that the observations of a variable myst have randomness. Presence of this
randomness is important for statistical inference.

In case of a time series data, the randomness in the movement of a variable can also be tested
with the help of run test. Run test is one of the popular test to test the weak form of the market
in efficient market hypothesis.

Example: Test whether the variable “Income” in the following dataset is random or not.

Table 9.22: Data set for Run Test

S.N S.N
o Income S.No Income S.No Income o Income
1 24000 11 65000 21 39000 31 36000
2 34000 12 34000 22 43000 32 63000
3 45000 13 25000 23 65000 33 57000
4 43000 14 65000 24 54000 34 51000
5 18000 15 36000 25 34000 35 24000
6 28000 16 65000 26 64000 36 54000
7 54000 17 54000 27 76000 37 45000
8 76900 18 38000 28 45000 38 54000
9 24000 19 25000 29 125000 39 63000
10 34000 20 76000 30 12000 40 61000

SPSS Output and Interpretation

The procedure of applying Binomial test in SPSS is described below:


Step 1: Click ‘Analyze’  ‘Nonparametric Tests’  ‘Legacy Dialogs’  ‘Runs”
26  Chapter 9

Step 2: Transfer the variable in ‘Test Variable List’  Select ‘Median’ as cut point. This
command divide the data in two equal groups  Press ‘OK’

SPSS output of the Binomial test is given below in table 9.21

Table 9.23: Runs Test


Income
a
Test Value 45000.00
Cases < Test Value 18
Cases >= Test Value 22
Total Cases 40
Number of Runs 20
Z -.097
Non-Parametric Tests  27

Asymp. Sig. (2-tailed) .923


a. Median

Interpretation of the output

The null hypothesis of the run test is “The observation of the variable ‘Income’ is random”.
Since the p value of the run test is found to be 0.923, null hypothesis with 95 percent
confidence level can be accepted. Thus it can be concluded from the results that the
observations of the variable ‘Income’ is random.

9.8 Summary
This chapter explained the meaning and importance of non-parametric tests. In addition, it
covers in-depth explanation of different types of non-parametric tests. Chi-square test, Wilcoxon
rank-sum test, Mann-Whitney test, Kruskal-Wallis test, Friedman ANOVA, Binaomial test and
Runs test are also explained with a practical approach. The usage of SPSS is shown with different
examples to provide a better understanding.

9.9 Exercises

Multiple-Choice Questions
Q1. Which of the following is called an assumption-free test?
a. Parametric test b. Non-parametric test
c. Chi-square test d. b and c
Q2. Which of the following tests is used for normal distribution?
a. Parametric test b. Non-parametric test
c. a and b d. None of these
Q3. Cross-tabulation in _______ is also known as a contingency table.
a. Chi-square test b. Wilcoxon rank sum test
c. Mann-Whitney test d. Kruskal-Wallis test
Q4. _______ statistic is an alternative to the Pearson’s chi-square test and based on
the theory of maximum likelihood.
a. Likelihood ratio b. Linear-by-linear association
c. a and b d. None of these
Q5. Which of the following tests is appropriate when a researcher wants to test
differences between two conditions and different participants have been used in
each condition?
a. Likelihood ratio b. Mann-Whitney test
c. Wilcoxon rank-sum test d. b and c
28  Chapter 9

Q6. _________ is a non-parametric tests used to test differences in more than two
different conditions and same participants have been used in all conditions.
a. Chi-square test b. Wilcoxon rank-sum test
c. Mann-Whitney test d. Friedman ANOVA
Q7. ____ is an alternative to one-way independent ANOVA if the assumptions of a
parametric test are violated.
a. Chi-square test b. Wilcoxon rank-sum test
c. Mann-Whitney test d. Kruskal-Wallis

Long-Answer Questions
Q1. Differentiate between parametric and non-parametric tests.
Q2. Explain chi-square test with assumptions.
Q3. Discuss similarities between Wilcoxon rank-sum test and Mann-Whitney test.
Q4. Elaborate on the Kruskal-Wallis test.
Q5. Elucidate Friedman ANOVA.
Q6. A researcher wants to analyse the possible association between the gender of the customer
and their habit of using internet for shopping. He collected the data from 100 customers
and applied chi square test between the gender of the customer and their usage of internet
for shopping. The output of the results is given below.

Gender * Internet_for_Shopping Crosstabulation


Uses Internet for Total
Shopping
No Yes
Count 36 14 50
Male
Expected Count 28.0 22.0 50.0
Gender
Count 20 30 50
Female
Expected Count 28.0 22.0 50.0
Count 56 44 100
Total
Expected Count 56.0 44.0 100.0

Chi-Square Tests
Value df Asymp. Sig. Exact Sig. Exact Sig.
(2-sided) (2-sided) (1-sided)
Pearson Chi-Square 10.390a 1 .001
Continuity Correctionb 9.131 1 .003
Likelihood Ratio 10.589 1 .001
Fisher's Exact Test .002 .001
Linear-by-Linear Association 10.286 1 .001
N of Valid Cases 100
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 22.00.
b. Computed only for a 2x2 table
Non-Parametric Tests  29

Symmetric Measures
Value Approx. Sig.
Phi .322 .001
Nominal by Nominal
Cramer's V .322 .001
N of Valid Cases 100

Answer the following:


(a) What do you mean by test of association?
(b) What is the null hypothesis of Chi Square Test in the example?
(c) Interpret the results.

Q7: Raunak Singh, the manager of a private hospital, is interested to know the
association between the income group of the patients and how regularly they visited
their physicians of the hospital. He collected the data of 150 patients and applied Chi
square test. The results of Chi square test is shown below:

Income_group * Consulted_Doctor Crosstabulation


Consulted Doctor Total
regularly
Yes No
Count 8 14 22
Less than 5 lakhs per annum
Expected Count 10.8 11.2 22.0
Count 10 20 30
5 to 10 lakhs per annum
Expected Count 14.7 15.3 30.0
Count 12 19 31
Income group 10 to 15 lakhs per annum
Expected Count 15.2 15.8 31.0
Count 20 12 32
15 to 20 lakhs per annum
Expected Count 15.7 16.3 32.0
More than 20 lakhs per Count 23 11 34
annum Expected Count 16.7 17.3 34.0
Count 73 76 149
Total
Expected Count 73.0 76.0 149.0

Chi-Square Tests
30  Chapter 9

Value df Asymp. Sig. (2-


sided)
a
Pearson Chi-Square 12.730 4 .013
Likelihood Ratio 12.939 4 .012
Linear-by-Linear Association 10.495 1 .001
N of Valid Cases 149
a. 0 cells (0.0%) have expected count less than 5. The minimum
expected count is 10.78.

Symmetric Measures
Value Approx. Sig.
Phi .292 .013
Nominal by Nominal
Cramer's V .292 .013
N of Valid Cases 149

Answer the following:

(a) Mention the null hypothesis


(b) Mention the significance of Cramer’s V Test
(c) Interpret the results

Q8. The vice chancellor of a university is interested to know the level of publication (3
categories: no publication, satisfactory publications and excellent publications) by the
professors, associate professor and assistant professors of different department in last three
years. He collected the data of publications by the different faculty members at different
designations for the last three years and apply the Chi Square test in order to test the presence
of association if any between the level of publication and the designations in the university.
The output of the Chi square test is shown below:

Designation * Publications Crosstabulation


Publications Total
No Publication Satisfactory Excellent
Publications Publications
Designation Assistant Count 23 23 46 92
Professor Expected Count 31.3 30.8 29.9 92.0
Associate Count 23 22 11 56
Professor Expected Count 19.0 18.8 18.2 56.0
Professor Count 22 22 8 52
Non-Parametric Tests  31

Expected Count 17.7 17.4 16.9 52.0


Count 68 67 65 200
Total
Expected Count 68.0 67.0 65.0 200.0

Chi-Square Tests
Value df Asymp. Sig. (2-
sided)
Pearson Chi-Square 24.023a 4 .000
Likelihood Ratio 24.572 4 .000
Linear-by-Linear Association 15.559 1 .000
N of Valid Cases 200
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is
16.90.

Symmetric Measures
Value Approx. Sig.
Phi .347 .000
Nominal by Nominal
Cramer's V .245 .000
N of Valid Cases 200

Answer the following:

(a) Mention the null hypothesis of the Chi Square test in the example
(b) What is the significance of Cramer’s V test statistics in Chi Square?
(c) Conclude the results from the output.

Q9. The vice chancellor of a university is interested to know the publication by the professors,
associate professor and assistant professors of different department in last three years. She
collected the data of publications by the different faculty members at different designations
for the last three years. The data is given below:

S No Assistant Associat Professor


Professor e
Professor
1 3 3 2
32  Chapter 9

2 5 4 3
3 4 5 3
4 5 5 4
5 6 4 3
6 5 5 2
7 4 5 3
8 5 4 4
9 6 5 4
10 7 5 3
11 6 6 2
12 7 5 3
13 8 6 4
14 9 6 4
15 8 6 3
16 7 5 4
17 8 4 4
18 9 4 3
19 9 5 2
20 8 6 3

Since the sample size is less than 30, Kruskal Wallis test is applied in the collected data. The
spss out is given below:

Ranks
Designation N Mean Rank
Assistant Professor 20 44.68
Associate Professor 20 33.95
Publications
Professor 20 12.88
Total 60

Test Statisticsa,b
Publications
Chi-Square 35.460
df 2
Asymp. Sig. .000
a. Kruskal Wallis Test
b. Grouping Variable:
Designation

Answer the following:


(a) Name the parametric test which is replaced by Kruskal Wallis test in the given
Non-Parametric Tests  33

example because of small sample:


(b) Interpret the results

Q10: The HR manager of an enterprise wants to conduct an experiment to investigate the


performance of employees as a result of certain motivational incentives. He tests 20
employees in all. He divided employees in two groups. In case of 10 employees in Group 1,
there is no change in any incentive scheme. However, other 10 employees are offered that
specific incentive related to their performance. The levels of performance of both the groups
are measured initially as well as after one month. The Mann Whitney test is applied and the
results are given below:

Table 9.8: Descriptive Statistics


N Mean Std. Minimum Maximum
Deviation
(c) Pre- (d) (e) 1 (f) 5.0 (g) 13. (h) 35.
performance 20 8.0000 7833 00 00
(i) Post- (j) (k) 2 (l) 12. (m) 3.0 (n) 39.
performance 0 1.0500 92275 0 00
(o) Incentive (p) (q) 1 (r) .51 (s) 1.0 (t) 2.0
20 .5000 299 0 0

Table 9.9: Ranksof Different Groups

Incentive N Mean Sum of Ranks


Rank
(u) Pre_Perform (v) Incen (w) 1 (x) 11.9 (y) 119.50
ance tive 0 5
(z) No (aa) 1 (bb) 9.05 (cc) 90.50
Incentive 0
(dd) Total (ee) 2 (ff) (gg)
0
(hh) Post_Perfor (ii) Incen (jj) 1 (kk) 15.1 (ll) 151.00
mance tive 0 0
(mm) No (nn) 1 (oo) 5.90 (pp) 59.00
Incentive 0
(qq) Total (rr) 2 (ss) (tt)
0
34  Chapter 9

Table 9.10: Test Statistics of Mann-Whitney U and Wilcoxon Tests


Pre- Post-Performance
Performance
(uu) Mann-Whitney U (vv) 35.500 (ww) 4.000
(xx) Wilcoxon W (yy) 90.500 (zz) 59.000
(aaa) Z (bbb) -1.105 (ccc) -3.484
(ddd) Asymp. Sig. (2-tailed) (eee) .269 (fff) .000
(ggg) Exact Sig. [2*(1-tailed Sig.)] (hhh) .280 (iii) .000

Answer the following


(a) In the experimental design the Mann Whitney is applied to replace which parametric
test?
(b) Interpret the results
(c) What is the null hypothesis of Mann Whitney test?

Q11. The sales of the products manufactured by the company is through three different
sources namely online sales, offline sales and the export orders. The weekly data of sales
from these three sources are collected for 20 weeks. The data is shown below:

Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Onlin 2.2 3. 4. 3. 4. 5. 6. 7. 4. 5. 6. 5. 6. 7. 6. 5. 6. 7. 8. 8.
e 1 4 2 1 4 2 1 5 6 2 4 2 9 4 1 5 7 2 5
sales
Offlin 10. 11 2. 3. 5. 5. 2. 4. 9. 6. 7. 5. 6. 9. 3. 7. 9. 3. 4. 5.
e 2 7 5 1 6 9 1 2 4 2 3 2 3 4 6 3 5 3 7
sales
Expor 11 23 21 32 12 13 13 14 21 23 13 24 14 16 14 13 15 16 13 19
t
Order
s
All figures in Rs Crores

The researcher is interested to explore the existence of the difference between the average
weekly sales from the three different sources. She decided to apply a non-parametric test and
the spss output of the test is shown below:
Table: Descriptive analysis
N Mean Std. Deviation Minimum Maximum
Online sales 20 5.6950 1.75123 2.20 8.50
Offline sales 20 6.1250 2.57884 2.70 11.00
Export Orders 20 17.0000 5.40954 11.00 32.00

Table: Ranks
Mean Rank
Online sales 1.43
Non-Parametric Tests  35

Offline sales 1.58


Export Orders 3.00

Table: Test Statistics


N 20
Chi-Square 30.608
df 2
Asymp. Sig. .000
Answer the following:

1. Mention the name of the test applied here.


2. Mention the null hypothesis
3. Write the name of parametric test applicable here in case of large sample size.
4. Interpret the results.

You might also like