You are on page 1of 30

BFC 34303

CIVIL ENGINEERING STATISTICS


Chapter 9
Chi-Square Test and Non-
Parametric Tests
Faculty of Civil and Environmental Engineering
Universiti Tun Hussein Onn Malaysia

Chi-square Distribution (𝜒 Distribution)


The 𝜒 distribution is a continuous probability distribution that is widely
used in statistical inference. It is related to the standard normal
distribution.
If a random variable 𝑍 has the standard normal distribution, then 𝑍 has
the 𝜒 distribution with one degree of freedom.

𝜒 distribution with one


degree of freedom.

𝑍 𝑍
-4 -2 0 2 4 0 2 4 6 8

1
If 𝑍 , 𝑍 , …, 𝑍 are independent standard normal variables, then

𝑍 +𝑍 + ⋯+ 𝑍

has a 𝜒 distribution with 𝑘 degrees of freedom.

The pdf of the 𝜒 distribution with 𝑘 degrees of freedom is given by

𝑥 𝑒
𝑓 𝑥 = 𝑓𝑜𝑟 𝑥 ≥ 0
𝑘
2 Γ 2

The mean of the 𝜒 distribution, 𝜇 = 𝑘 while the variance, 𝜎 = 2𝑘.

𝑑𝑓 = 𝜇 = 1 As the degree of freedom of


𝑑𝑓 = 𝜇 = 5 the 𝜒 distribution increases,
𝑑𝑓 = 𝜇 = 10 the right-skewness of the
distribution decreases and the
𝑑𝑓 = 𝜇 = 15
distribution becomes more
symmetrical.
As the degree of freedom
increases, the 𝜒 distribution
𝜒 looks more like a normal
0 5 10 15 20 25 distribution.
The mode of the 𝜒 distribution
is equal to 𝑘 − 2.

2
Chi-square Statistic
The 𝜒 statistic for a random sample of size 𝑛 with a standard deviation 𝑠,
selected from a normal population having a standard deviation 𝜎 can be
calculated using the following equation:
𝑛−1 𝑠
𝜒 =
𝜎

Cumulative Probability of the Chi-square Distribution


The 𝜒 distribution is constructed so that the total area under the curve is
equal to 1. The area under the curve between 0 and a particular 𝜒 value
is a cumulative probability associated with that 𝜒 value. The cumulative
probability associated with a particular 𝜒 statistic can be easily
determined using a 𝜒 table.

Example 9.1
A manufacturer has developed a new cell phone battery. On average, the
battery lasts 60 minutes on a single charge. The standard deviation is 4
minutes. Suppose the manufacturing department runs a quality control
test. They randomly select 7 batteries. The standard deviation of the
selected batteries is 6 minutes. What is the probability that the standard
deviation will be greater than 6 minutes?

Let X be the standard deviation of battery life after a single charge.


We know 𝑛 = 7, 𝑠 = 6 and 𝜎 = 4

𝑛−1 𝑠 7−1 6
𝜒 = = = 13.5
𝜎 4

𝑃 𝑋 > 6 = 𝑃 𝜒 > 13.5 = 1 − 𝑃 𝜒 ≤ 13.5 =?


6

3
Degrees of freedom = 𝑛 − 1 = 7 − 1 = 6

From the cumulative 𝜒 distribution table with df = 6:


𝜒 = 13.5 lies between 12.6 and 14.4
𝑃 𝜒 ≤ 12.6 = 0.95
𝑃 𝜒 ≤ 13.5 = ?
𝑃 𝜒 ≤ 14.4 = 0.975

Using interpolation we get 𝑃 𝜒 ≤ 13.5 = 0.963

𝑃 𝜒 > 13.5 = 1 − 𝑃 𝜒 ≤ 13.5 = 1 − 0.963 = 0.037

Cumulative 𝝌𝟐
Distribution Table

4
Goodness-of-Fit Test
The goodness-of-fit test is one of the most commonly used non-
parametric tests, which was introduced by Karl Pearson.
The purpose of this test is to compare an observed set of frequencies to
an expected set of frequencies.
The chi-square distribution, which is used as the test statistic, has the
following characteristics:
• Chi-square is never negative.
• There is a family of chi-square distributions.
• The chi-square distribution is positively skewed.

The chi-square test statistic, 𝜒 is computed as follows:

𝑓 −𝑓
𝜒 = with 𝑘 − 1 degrees of freedom
𝑓

where
𝑘 = number of categories
𝑓 = observed frequency in a particular category
𝑓 = expected frequency in a particular category

10

5
Example 9.2

A laboratory tested 120 concrete No. of Samples


Technician
samples. The table shows the Tested
number of samples tested by each Tom 13
of its laboratory technicians. Ryan 33
Can it be concluded that the Tyra 14
number of samples tested are not George 7
the same for each technician? Hannah 36
John 17

11

𝐻 : There is no difference between the observed and expected


frequencies (𝑓 = 𝑓 )
𝐻 : There is a difference between the observed and expected
frequencies (𝑓 ≠ 𝑓 )

We shall use a significance level, 𝛼 of 0.05


𝑓 −𝑓
The test statistics is the chi-square distribution: 𝜒 =
𝑓
with 𝑘 − 1 = 6 − 1 = 5 degrees of freedom

From the chi-square table with 𝛼 = 0.05 and df = 5, the critical 𝜒 value is
11.07.

12

6
Critical 𝝌𝟐
Table

The shaded
region is
𝜒 =𝜒

13

Decision rule:
If the calculated 𝜒 is greater than or equal to the critical 𝜒 (11.07)
 Reject 𝐻

Reject 𝐻

𝜒
Critical 𝝌𝟐

14

7
Since there are 120 samples, we expect that each technician will test 20
samples.

No. of Expected No. 𝑓 −𝑓


Technician Samples of Samples
Tested (𝑓 ) Tested (𝑓 ) 𝑓

Tom 13 20 2.45
Ryan 33 20 8.45
Tyra 14 20 1.80
George 7 20 8.45
Hannah 36 20 12.80
John 17 20 0.45
Total 120 120 34.40

15

𝑓 −𝑓
𝜒 = = 34.40
𝑓

Since the calculated 𝜒 (34.40) is greater than the critical 𝜒 (11.07), we


reject 𝑯𝒐 . Thus we accept 𝐻 , which states that there is a difference
between the observed and expected frequencies.
This means that the number of samples tested are not the same for each
technician.

16

8
Contingency Table Analysis
The chi-square test can also be used for a research involving two traits.
For example:
1. Is there any relationship between the grade point average (GPA) of
students and their income 10 years after graduation?
2. Is there an association between drivers of different vehicle classes and
their compliance with speed limits?
A contingency table analysis is conducted to find the relationship between
two variables.
Information is collected and displayed in a contingency table, which is a
type of table in a matrix format that shows the frequency distribution of the
variables.

17

The chi-square test statistic is:

𝑓 −𝑓
𝜒 = with (𝑟 − 1)(𝑐 − 1) degrees of freedom
𝑓

where
𝑟 = number of rows
𝑐 = number of columns
𝑓 = observed frequency for a cell
𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
𝑓 = expected frequency for a cell =
𝐺𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙

18

9
Example 9.3

A traffic safety engineer wants to investigate if age group of motorcyclists


is related to compliance with safety helmet usage. He conducted a study
on 200 motorcyclists and obtained the following data:

Safety Helmet Age group of motorcyclists


Usage 16 - 25 26 - 40 41 - 55 > 55
Used 25 33 27 35
Did not use 25 27 13 15

Can it be concluded that age group is related to compliance with safety


helmet usage?

19

𝐻 : There is no relationship between age group and compliance with


safety helmet usage.
𝐻 : There is a relationship between age group and compliance with
safety helmet usage.

We shall use a significance level, 𝛼 of 0.05


𝑓 −𝑓
The test statistics is the chi-square distribution: 𝜒 =
𝑓
with 𝑟 − 1 𝑐 − 1 = 2 − 1 4 − 1 = 3
degrees of freedom

From the chi-square table with 𝛼 = 0.05 and df = 3, the critical 𝜒 value is
7.815.

20

10
Critical 𝝌𝟐
Table

The shaded
region is
𝜒 =𝜒

21

Decision rule:
If the calculated 𝜒 is greater than or equal to the critical 𝜒 (7.185)
 Reject 𝐻

Reject 𝐻

𝜒
Critical 𝝌𝟐

22

11
Safety Age group of motorcyclists
Total
Helmet 16 - 25 26 - 40 41 - 55 > 55
Usage 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓
Used 25 30 33 36 27 24 35 30 120
Did not use 25 20 27 24 13 16 15 20 80
Total 50 60 40 50 200

𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 120 × 50


𝑓 = = = 30
𝐺𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 200

𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 80 × 50


𝑓 = = = 20
𝐺𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 200
23

Safety Age group of motorcyclists


Helmet 16 - 25 26 - 40 41 - 55 > 55 Total
Usage 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓 𝑓
25 30 33 36 27 24 35 30
Used 25 − 30 33 − 36 27 − 24 35 − 30 2.291
30 36 24 30
= 0.833 = 0.250 = 0.375 = 0.833
25 20 27 24 13 16 15 20
Did not use 25 − 20 27 − 24 13 − 16 15 − 20 3.438
20 24 16 20
= 1.250 = 0.375 = 0.563 = 1.250
Total 5.729

24

12
𝑓 −𝑓
𝜒 = = 5.729
𝑓

Since the calculated 𝜒 (5.729) is less than the critical 𝜒 (7.185), we do


not reject 𝑯𝒐 . Thus we accept 𝐻 , which states that there is no
relationship between age group and compliance with safety helmet usage.

25

Non-Parametric Tests: Analysis of Ranked Data


Non-parametric tests are also called distribution-free tests because they
do not assume that the data follow a specific distribution.
Generally, we should use non-parametric tests when our data do not meet
the assumptions of the parametric test, especially the assumption about
normally distributed data.
Apart from that, the other reasons for using non-parametric tests are:
1. The central tendency of our data is better represented by the median.
2. Our sample size is very small.
3. We have ordinal data, ranked data, or outliers that we cannot remove.
In this chapter, we will learn the Sign Test, Wilcoxon Test and Mann-
Whitney Test.
26

13
Reasons for using parametric and non-parametric tests

Parametric Test Non-Parametric Test

• Performs well with skewed and • The central tendency of our


non-normal distributions. data is better represented by
• Performs well when the spread the median.
of each group is different. • Our sample size is very small.
• Has more statistical power • We have ordinal data, ranked
thus, it is more likely to detect a data, or outliers that we cannot
significant effect when one truly remove.
exists.

27

Difference between Parametric and Non-Parametric Tests

Parametric Test Non-Parametric Test

• Information about population is • No information about the population


completely known is available
• Specific assumptions are made • No assumptions are made
regarding the population regarding the population
• The null hypothesis is made on • The null hypothesis is free from
parameters of the population parameters
distribution • The test statistic is subjective
• The test statistic is based on the • Non-parametric tests are applied for
distribution both variables and attributes
• Parametric tests are applicable only
for variables

28

14
Difference between Parametric and Non-Parametric Tests (cont’d)

Parametric Test Non-Parametric Test

• Cannot be used to test for nominal • Can be used to test for nominal and
data ordinal data
• Parametric tests are powerful, thus • Non-parametric tests are not
we are more likely to detect a powerful like parametric tests
significant effect if it exists

29

The Sign Test


The sign test is based on the sign of a difference between two related
observations.
We usually designate a plus (+) sign for a positive difference and a minus
(-) sign for a negative difference.
The sign test has many applications, one is for before/after experiments.
For example, a tune-up program for cars is evaluated whereby the
mileage per gallon of fuel before and after the tune-up are recorded. A “+”
sign is assigned to an increase in mileage, and a “-” sign to a decrease in
mileage.
A hypothesis testing is conducted to see if the intervention is effective.
The procedure is explained in the following example.

30

15
Engineer Before After
Example 9.4 Terry Good Outstanding
Sue Fair Excellent
15 engineers were randomly selected James Excellent Good
to assess their level of competence Ted Poor Good
in using software for design and Andy Excellent Excellent
analysis. The engineers underwent a Sarah Good Outstanding
software training program and were John Poor Fair
rated Outstanding, Excellent, Good, Jim Excellent Outstanding
Fair or Poor before and after the Cody Good Poor
training. Troy Poor Good
Can it be concluded that the Vanessa Good Outstanding
engineers were more competent after Cole Fair Excellent
the training? Candy Good Fair
Arthur Good Outstanding
Sandra Poor Good
31

Engineer Before After Sign of Difference


Terry Good Outstanding +
Sue Fair Excellent +
James Excellent Good -
Ted Poor Good +
Dropped from
Andy Excellent Excellent 0
analysis because of
Sarah Good Outstanding + no change
John Poor Fair +
Jim Excellent Outstanding +
Cody Good Poor -
Key:
Troy Poor Good +
Vanessa Good Outstanding + + Increase in level
Cole Fair Excellent + of competence
Candy Good Fair -
- Decrease in level
Arthur Good Outstanding + of competence
Sandra Poor Good +
32

16
𝐻 : 𝜋 ≤ 0.50 (There is no increase in competence as a result of the training)

𝐻 : 𝜋 > 0.50 (There is an increase in competence as a result of the training)

Notes:
𝜋 refers to the proportion in the population.
The binomial distribution is used as the test statistic because the sign test meets
the binomial assumptions:
1. There are only two outcomes: “success” and “failure”.
2. Each trial is independent (the performance of one engineer is not related to
another engineer).
3. For each trial the probability of success is assumed to be 𝑝 = 0.50.
4. The total number of trials is fixed, i.e. 𝑛 = 14.

33

We choose the level of significance 𝛼 = 0.10.

The test statistic is the number of plus signs, which is 11.

Formulating the decision rule:


1. Construct a table showing the binomial probability distribution for 𝑛 =
14 and 𝜋 = 0.50, i.e. probability and cumulative probability distribution.
2. Determine the critical value, i.e. the number of successes
corresponding to the cumulative probability that is closest to, but not
greater than the level of significance 𝛼 = 0.10.
3. Since this is a one-tailed test, reject 𝐻 if the number of plus signs is
equal to or greater than the critical value.

34

17
Number of successes Probability of success Cumulative probability
0 0.000 1.000
1 0.001 0.999
2 0.006 0.998
3 0.022 0.992
4 0.061 0.970
5 From
0.122 0.909
6 binomial 0.183 0.787
probabilities
7 table
0.209 0.604
8 0.183 0.395 Cumulative
probability that is
9 0.122 0.212
Critical closest to, but not
10 0.061 0.090
value greater than the
11 0.022 0.029 level of significance
12 0.006 0.007 𝛼 = 0.10
13 0.001 0.001
14 0.000 Add up 0.000
35

The decision rule is therefore:


If the number of plus signs is greater than or equal to the critical value
(10)  Reject 𝐻

Decision:
Since the number of plus signs (11) is greater than the critical value (10),
we reject 𝑯𝒐 . Thus we accept 𝐻 , which states that there is an increase in
competence as a result of the training.
In other words, the training was effective as there is evidence that shows
an increase in level of competence after the training.

36

18
Wilcoxon Signed-Rank Test
The Wilcoxon signed-rank test is a non-parametric test developed by
Frank Wilcoxon. It is based on the differences in dependent (matched)
samples, where the normality assumption is not required.
This test is the non-parametric alternative of the dependent samples 𝑡-
test.
Two slightly different versions of the test exist:
• The Wilcoxon signed-rank test – compares the sample median
against a hypothetical median.
• The Wilcoxon matched-pairs signed-rank test – computes the
difference between each set of matched pairs, then follows the same
procedure as the signed-rank test to compare the sample against some
median.
37

The null hypothesis for this test is that the medians of two samples are
equal. It is generally used:
• as a non-parametric alternative to the one-sample 𝑡-test or paired 𝑡-
test.
• for ordered (ranked) categorical variables without a numerical scale.

38

19
Example 9.5 Road
Speed during the Speed at night
day (km/h) (km/h)
Vehicle speeds along 10 roads were 1 67.2 65.3
recorded during the day and at night
2 59.4 54.7
in order to study if speeds differ
during the day and at night. The 3 80.1 81.3
average speeds obtained are shown 4 47.6 39.8
in the table.
5 97.8 92.5
At the 0.05 significance level, is 6 57.3 52.4
there evidence to conclude that
7 75.2 79.8
speed of vehicles are different during
the day and at night? 8 94.7 89.0
9 64.3 58.4

10 54.0 56.4

39

𝐻 : Speed during the day and at night are equal


𝐻 : Speed during the day and at night are not equal

The significance level is α = 0.05 and the test is two-tailed.

Decision rule:
If the calculated Wilcoxon test statistic, 𝑊 is less than or equal to the
critical Wilcoxon value, 𝑊  Reject 𝐻

𝑊 is obtained from the Wilcoxon Signed-Rank Test Table, while 𝑊


is determined as follows:

40

20
Speed Rank Signed Rank
Speed at Absolute
Road during the Difference (Ascending
night (km/h) Difference Order) R+ R–
day (km/h)
1 67.2 65.3 1.9 1.9 2 2

2 59.4 54.7 4.7 4.7 5 5

3 80.1 81.3 -1.2 1.2 1 1


4 47.6 39.8 7.8 7.8 10 10

5 97.8 92.5 5.3 5.3 7 7


6 57.3 52.4 4.9 4.9 6 6

7 75.2 79.8 -4.6 4.6 4 4

8 94.7 89.0 5.7 5.7 8 8


9 64.3 58.4 5.9 5.9 9 9

10 54.0 56.4 -2.4 2.4 3 3


Total 47 8
41

𝑊 is the smaller of the two rank sums R+ and R–


𝑊=8

From the table, 𝑊 given significance level 𝛼 = 0.05 and number of


pairs 𝑛 = 10
𝑊 =8

Since 𝑊 is equal to 𝑊 , we can reject 𝑯𝒐 . Thus we accept 𝐻 , which


states that speed during the day and at night are not equal.
We have evidence to conclude that speed of vehicles are different during
the day and at night.

42

21
Critical Values for the
Wilcoxon Signed-Rank
Test

43

Mann-Whitney Test
The Mann-Whitney Test, also known as the Wilcoxon Rank Sum Test, is
specifically designed to determine whether two independent samples
came from equal populations.
This test is an alternative ot the two-sample 𝑡 test we have learned
previously. Unlike the 𝑡 test, the Mann-Whitney Test does not require the
two populations to follow the normal distribution and have equal
population variances.
The Mann-Whitney Test is based on the average of ranks. The data are
ranked as if the observatios were part of a single sample.
If the null hypothesis is true, then the ranks will be about evenly
distributed between the two samples, and the average of the ranks for the
two samples will be about the same.
44

22
If the alternative hypothesis is true, one of the samples will have more of
the lower ranks, and thus, a smaller rank average.
If each of the samples contains 8 or more observations, the standard
normal distribution is used as a test statistic. The formula is:

𝑛 (𝑛 + 𝑛 + 1)
𝑊− 2
𝑧=
𝑛 𝑛 (𝑛 + 𝑛 + 1)
12

where 𝑛 = number of observations from the first population


𝑛 = number of observations from the second population
𝑊 = sum of the ranks from the first population
45

Example 9.6 Number of no-shows


The manager of an airline company noted Penang Kuala Lumpur
an increase in the number of no-shows for
11 13
flights out of Penang. He is particularly
interested in determining whether there 15 14
are more no-shows for flights that 10 10
originate from Penang compared with
18 8
flights leaving Kuala Lumpur. A sample of
11 16
9 flights from Penang and 8 from Kuala
Lumpur were taken. At the 0.05 20 9
significance level, can we conclude that 24 17
there are more no-shows for the flights
22 21
originating in Penang?
25

46

23
𝐻 : The distribution of no-shows is the same for Penang and Kuala
Lumpur
𝐻 : The distribution of no-shows is larger for Penang than for Kuala
Lumpur

The significance level is α = 0.05 and the test is one-tailed.

Decision rule:
If the calculated 𝑧 value is less than or equal to the critical 𝑧 value 
Reject 𝐻

The critical 𝑧 value, given that α = 0.05 is 1.65

47

0.45

0.05 (5%)

From the z-table, z = 1.65

48

24
Areas under the Standard Normal Curve (z-Table) showing values for P(0 ≤ Z ≤ z)
0 z

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

49

Penang Kuala Lumpur


Next, we rank the
observations from both No-shows Rank No-shows Rank
samples as if they were a 11 5.5 13 7
single group, and find the
sum of the ranks. 15 9 14 8

10 3.5 10 3.5

When there are similar 18 12 8 1


numbers within the group, 11 5.5 16 10
give it the average rank. 20 13 9 2

24 16 17 11
Example: There are two 10 no-shows.
The ranks involved are 3 and 4, but 22 15 21 14
we assign the average rank, (3 + 4)/2
25 17
= 3.5
Sum 96.5 Sum 56.5

50

25
We then calculate 𝑧, knowing that 𝑊 = 96.5, 𝑛 = 9 and 𝑛 = 8.

𝑛 (𝑛 + 𝑛 + 1) 9(9 + 8 + 1)
𝑊− 96.5 −
𝑧= 2 = 2 = 1.49
𝑛 𝑛 (𝑛 + 𝑛 + 1) 9(8)(9 + 8 + 1)
12 12

Since 𝑧 (1.49) is less than critical 𝑧 (1.65), we reject 𝑯𝒐 . Thus we accept


𝐻 , which states that the distribution of no-shows is larger for Penang
than Kuala Lumpur.
We can therefore conclude that there are more no-shows for the flights
originating in Penang.

51

Mann-Whitney 𝑈 Test
The Mann-Whitney 𝑈 Test is the non-parametric alternative test to the
independent sample 𝑡 test.
It is a non-parametric test that is used to compare two sample means that
come from the same population, and used to test whether two sample
means are equal or not.
Usually, the Mann-Whitney 𝑈 Test is used when the data is ordinal or
when the assumptions of the 𝑡 test are not met.

52

26
The Mann-Whitney 𝑈 statistic is determined using the following formula:
𝑛 (𝑛 + 1)
for Sample 1: 𝑈 =𝑛 𝑛 + −𝑅 Select the
2
smaller of the
𝑛 (𝑛 + 1)
for Sample 2: 𝑈 =𝑛 𝑛 + −𝑅 two
2

where 𝑛 = number of observations from the first sample


𝑛 = number of observations from the second sample
𝑅 = sum of the ranks from the first sample
𝑅 = sum of the ranks from the second sample

53

Example 9.7 Male Female


10 36
The test scores obtained by male and
female students are given in the table. At 22 53
the 0.05 significance level, can we 42 54
conclude that there is no difference 59 56
between the scores for males and 61 63
females?
63 84
65 88
83
85
90
93

54

27
𝐻 : The sample means are equal (samples are taken from identical populations)
𝐻 : The sample means are not equal (samples are not taken from identical
populations)

The significance level is α = 0.05 and the test is two-tailed.

Decision rule:
If the calculated 𝑈 value is less than or equal to the critical 𝑈 value
 Reject 𝐻

From the Critical Values of Mann-Whitney 𝑈 Test Table using α = 0.05,


𝑛 = 11 and 𝑛 = 7, critical 𝑈 = 16

55

Critical Values for the


Mann-Whitney 𝑈 Test
(Two-tailed testing)

56

28
Male Female
Next, we assign Score Rank Score Rank
ranks and find the 10 1 36 3
sum of the ranks.
22 2 53 5
(This step is similar to
42 4 54 6
the previous example)
59 8 56 7
61 9 63 10.5
63 10.5 84 14
65 12 88 16
83 13
85 15
90 17
93 18
Sum 109.5 Sum 61.5
57

The Mann-Whitney 𝑈 statistic is then determined:


𝑛 (𝑛 + 1)
for Males: 𝑈 =𝑛 𝑛 + −𝑅
2
11 11 + 1
𝑈 = 11 7 + − 109.5 = 33.5
2
𝑛 (𝑛 + 1)
for Females: 𝑈 =𝑛 𝑛 + −𝑅
2
7 7+1
𝑈 = 11 7 + − 61.5 = 43.5
2

Select the smaller of the two: 𝑈 = 33.5

58

29
Since the calculated 𝑈 (33.5) is greater than the critical 𝑈 (16), we do
not reject 𝑯𝒐 . Thus we accept 𝐻 , which states the sample means are
equal.
We can therefore conclude that there is no difference between the scores
for males and females.

59

30

You might also like