Week11 Non Parametric

Lecture 16
Nonparametric Statistics
 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness
SIS 1037Y(1) 2020 -2021 2

In the preceding lectures, we have seen a
variety of different methods of inferential
statistics.
Many of those methods require normally
distributed populations and are based on
sampling from a population with specific
parameters, such as the mean μ, standard
deviation σ, or population proportion p.
SIS 1037Y(1) 2020 -2021 3

Definitions
Parametric tests have requirements about the nature
or shape of the populations involved.
Nonparametric tests do not require that samples
come from populations with normal distributions or
have any other particular distributions.
Consequently, nonparametric tests are called
distribution-free tests.
SIS 1037Y(1) 2020 -2021 4

1. Nonparametric methods can be applied to a wide
variety of situations because they do not have
the more rigid requirements of the corresponding
parametric methods. In particular, nonparametric
methods do not require normally distributed
populations.
2. Unlike parametric methods, nonparametric

methods can often be applied to categorical
data, such as the genders of survey
respondents.
SIS 1037Y(1) 2020 -2021 5
1. Nonparametric methods tend to waste
information because exact numerical data
are often reduced to a qualitative form.
2. Nonparametric tests are not as efficient as

parametric tests, so with a nonparametric
test we generally need stronger evidence
(such as a larger sample or greater
differences) in order to reject a null
hypothesis.
SIS 1037Y(1) 2020 -2021 6

Example: All things being equal, nonparametric rank correlation
requires 100 sample observations to achieve the same results as
91 sample observations analyzed through parametric linear
correlation, assuming the stricter requirements for using the
parametric test are met.
SIS 1037Y(1) 2020 -2021 7

Data are sorted when they are arranged
according to some criterion, such as smallest to
the largest or best to worst.
A rank is a number assigned to an individual
sample item according to its order in the sorted
list. The first item is assigned a rank of 1, the
second is assigned a rank of 2, and so on.
SIS 1037Y(1) 2020 -2021 8

Find the mean of the ranks involved and assign
this mean rank to each of the tied items.
SIS 1037Y(1) 2020 -2021 9

 Sign Test
Pairs
Independent Samples
SIS 1037Y(1) 2020 -2021 10

Key Concept
The main objective of this section is to understand
the sign test procedure, which involves converting
data values to plus and minus signs, then testing for
disproportionately more of either sign.
SIS 1037Y(1) 2020 -2021 11

The sign test is a nonparametric (distribution-
free) test that uses plus and minus signs to test
different claims, including:
1.Claims involving matched pairs of sample
data
2. Claims involving nominal data
3.Claims about the median of a single
population
SIS 1037Y(1) 2020 -2021 12

The basic idea underlying the sign test is to
analyze the frequencies of the plus and
minus signs to determine whether they are
significantly different.
For consistency, we will use a test statistic
based on the number of time the less
frequent sign occurs.
SIS 1037Y(1) 2020 -2021 13

The sample data have been randomly selected.
Note: There is no requirement that the sample

data come from a population with a particular
distribution, such as a normal distribution.
SIS 1037Y(1) 2020 -2021 14

x = the number of times the less frequent
sign occurs
n = the total number of positive and

negative signs combined
SIS 1037Y(1) 2020 -2021 15

For n ≤ 25: Test statistic is x = the number of
times the less frequent sign occurs
For n > 25: Test statistic is

n
( x  0.5) 
z 2
n
2
SIS 1037Y(1) 2020 -2021 16

P-Values: P-values are often provided by technology,
or can be often found using the z test
statistic.
Critical Values:
For n ≤ 25, critical x values are in Critical value for

sign test Table.
For n > 25, critical z values are in Normal dist Table .
SIS 1037Y(1) 2020 -2021 17

When applying the sign test in a one-tailed
test, we need to be very careful to avoid
making the wrong conclusion when one sign
occurs significantly more often than the other,
but the sample data contradict the alternative
hypothesis. See the following example.
SIS 1037Y(1) 2020 -2021 18

Among 945 couples who used the XSORT method
of gender selection, 66 had boys, so the sample
proportion of boys is 66 / 945 or 0.0698.
Consider the claim that the XSORT method

increases the likelihood of baby boys so that the
probability of a boy is p > 0.5. This claim
becomes the alternative hypothesis.
Using common sense, we see that with a sample

proportion of boys of 0.0698, we can never
support a claim that p > 0.5.
The sample proportion contradicts the alternative

hypothesis because it is not greater than 0.5.
SIS 1037Y(1) 2020 -2021 19

We must be careful to avoid making the
fundamental mistake of thinking that a claim
is supported because the sample results are
significant.
The sample results must be significant in the

same direction as the alternative hypothesis.
SIS 1037Y(1) 2020 -2021 20

Sign Test Procedure
SIS 1037Y(1) 2020 -2021 21

Sign Test Procedure
Sign Test
Normal
SIS 1037Y(1) 2020 -2021 22

Sign Test Procedure
Sign Test
Normal
SIS 1037Y(1) 2020 -2021 23

When using the sign test with data that are
matched pairs, we convert the raw data to plus
and minus signs as follows:
1. Subtract each value of the second variable
from the corresponding value of the first
variable.
2. Record only the sign of the difference found in

step 1.
Exclude ties: that is, any matched pairs in
which both values are equal.
SIS 1037Y(1) 2020 -2021 24

If the two sets of data have equal medians, the
number of positive signs should be approximately
equal to the number of negative signs.
SIS 1037Y(1) 2020 -2021 25

Example
The following table includes taxi-out times and
taxi-in times for a sample of American Airlines
Flight 21.
Use the sign test to test the claim that there is

no difference between taxi-out and taxi-in
times. Use a 0.05 significance level.
SIS 1037Y(1) 2020 -2021 26

Example - Continued
Requirement Check: We only need the sample
data to be a simple random sample, and that
requirement is satisfied.
We have 8 positive signs, 3 negative signs, and

1 difference of 0.
The sign test tells us whether or not the

numbers of positive and negative signs are
approximately equal.
SIS 1037Y(1) 2020 -2021 27

Example - Continued
The hypotheses are:
H 0 : There is no difference. (The median of the differences is equal to 0.)

H1 : There is a difference. (The median of the differences is not equal to 0.)
We let n = 11 and x = 3. We should proceed to

find the critical value from stats table.
From stats table, the critical value of 1 is found

for n = 11 and, α = 0.05 in two tails.
SIS 1037Y(1) 2020 -2021 28

Example - Continued
Since n ≤ 25, the test statistic is x = 3, and with a
critical value of 1, we fail to reject the null hypothesis
of no difference.
We conclude that there is not sufficient evidence to

warrant rejection of the claim of no difference between
taxi-out times and taxi-in times.
There does not appear to be a difference.
SIS 1037Y(1) 2020 -2021 29

Claims Involving Nominal Data
The nature of nominal data limits the
calculations that are possible, but we can
identify the proportion of the sample data that
belong to a particular category.
Then we can test claims about the

corresponding population proportion p.
SIS 1037Y(1) 2020 -2021 30

Example
879 of 945 babies born to parents using
the XSORT method of gender selection
were girls.
Use the sign test and a 0.05 level of
significance to test the claim that this
method of gender selection is effective
in increasing the likelihood of a baby
girl.
SIS 1037Y(1) 2020 -2021 31

Example - Continued
Requirement Check: The only requirement is that the
sample be a simple random sample, and based on the
design of the experiment, we can assume so.
Let p denote the population proportion of baby girls.
The claim is that the XSORT method increases the

likelihood of having a girl, so the hypotheses are:
H 0 : p  0.5
H1 : p  0.5
SIS 1037Y(1) 2020 -2021 32

Example - Continued
Denoting girls by (+) and boys by (), we have 879
positive signs and 66 negative signs.
The test statistic is

n
( x  0.5)   
z  2
n
2
 945 
(66  0.5)   
  2 
 26.41
945
2
SIS 1037Y(1) 2020 -2021 33

Example - Continued
With a test statistic of z = –26.41 and a critical value
of z = –1.645, we can reject the null hypothesis.
SIS 1037Y(1) 2020 -2021 34

Example - Continued
The XSORT method of gender selection does appear to
be associated with an increase in the likelihood of a
girl.
SIS 1037Y(1) 2020 -2021 35

Claims About the Median
of a Single Population
The negative and positive signs are
based on the claimed value of the
median. See the next example.
SIS 1037Y(1) 2020 -2021 36

Example
Data Set 3 in includes measured body temperatures
of adults.
Use the 106 temperatures listed for 12 A.M. on Day 2

with the sign test to test the claim that the median is
less than 98.6ºF.
Of the 106 subjects, 68 had temperatures below

98.6ºF, 23 had temperatures above 98.6ºF, and 15
had temperatures equal to 98.6ºF.
SIS 1037Y(1) 2020 -2021 37

Example - Continued
We test the claim:
H 0 : Median is equal to 98.6F.

H1 : Median is less than 98.6F.
We discard the 15 data values of 98.6 since they result

in differences of zero.
We have 68 negative signs and 23 positive signs, so n

= 91.
The value of n exceeds 25, so we obtain the test

statistic:
SIS 1037Y(1) 2020 -2021 38

Example - Continued
n
( x  0.5)   
z  2
n
2
 91 
(23  0.5)   
  2
 4.61
91
2
SIS 1037Y(1) 2020 -2021 39

Example - Continued
In this one-tailed test with α = 0.05, we can reject the
null hypothesis:
There is sufficient evidence to support the claim that

the median body temperature of healthy adults is less
than 98.6ºF. SIS 1037Y(1) 2020 -2021 40
 Sign Test
Pairs
Independent Samples
SIS 1037Y(1) 2020 -2021 41

Key Concept
The Wilcoxon signed-ranks test involves the

conversion of the sample data to ranks.
This test can be used for two different applications:
1.Testing a claim that a population of matched pairs

has the property that the matched pairs have
differences with a median equal to zero.
2.Testing a claim that a single population of
individual values has a median equal to some
claimed value.
SIS 1037Y(1) 2020 -2021 42

1. The data are a simple random sample.
2. The population of differences has a
distribution that is approximately
symmetric, meaning that the left half of
its histogram is roughly a mirror image
of its right half.
(There is no requirement that the data
have a normal distribution.)
SIS 1037Y(1) 2020 -2021 43

T = the smaller of the following two sums:
1. The sum of the positive ranks of the nonzero
differences d
2. The absolute value of the sum of the negative
ranks of the nonzero differences d
SIS 1037Y(1) 2020 -2021 44

For n ≤ 30, the test statistic is T.
For n > 30, the test statistic is:
n(n  1)
T
z 4
n(n  1)(2n  1)
24
SIS 1037Y(1) 2020 -2021 45

P-values: P-values are often provided by
technology, or can be found using the z test
statistic and a table.
Critical Values:
1. For n ≤ 30, the critical T value is found in a
Wilcoxon Table.
2. For n > 30, the critical z values are found in
a Normal Distribution Table.
SIS 1037Y(1) 2020 -2021 46

Step 1: For each pair of data, find the difference d
by subtracting the second value from the
first. Keep the signs, but discard any pairs
for which d = 0.
Step 2: Ignore the signs of the differences, then
sort the differences from lowest to highest
and replace the differences by the
corresponding rank value. When
differences have the same numerical value,
assign to them the mean of the ranks
involved in the tie.
SIS 1037Y(1) 2020 -2021 47

Step 3: Attach to each rank the sign difference from
which it came. That is, insert those signs that
were ignored in step 2.
Step 4: Find the sum of the absolute values of the
negative ranks. Also find the sum of the
positive ranks.
Step 5: Let T be the smaller of the two sums found in
Step 4. Either sum could be used, but for a
simplified procedure we arbitrarily select the
smaller of the two sums.
SIS 1037Y(1) 2020 -2021 48

Step 6: Let n be the number of pairs of data for
which the difference d is not 0.
Step 7: Determine the test statistic and critical
values based on the sample size, as
shown above.
Step 8: When forming the conclusion, reject the
null hypothesis if the sample data lead
to a test statistic that is in the critical
region that is, the test statistic is less
than or equal to the critical value(s).
Otherwise, fail to reject the null
hypothesis.
SIS 1037Y(1) 2020 -2021 49
Example
The first two rows of the following table include taxi

times for a sample of American Airlines Flight 21. Use
the sample data to test the claim that there is no
difference between taxi-out and taxi-in times. Use
the Wilcoxon signed-ranks test and a 0.05 level of
significance.
SIS 1037Y(1) 2020 -2021 50

Example - Continued
Requirement Check: The data are from a simple

random sample. The differences should be symmetric,
and though the histogram does not support this, we
have only 11 differences and the issue is not too
extreme.
SIS 1037Y(1) 2020 -2021 51

Example - Continued
The claim is of no difference between

taxi-in and taxi-out times, so the
hypotheses are:
H 0 : There is no difference. (The median of the differences is 0.)
H1 : There is a difference. (The median of the differences is not equal to 0.)
Using the 8-step procedure described earlier,

the test statistic is T = 11.
SIS 1037Y(1) 2020 -2021 52

Example - Continued
The sample size is n = 11, so the critical value is

found in from our table. Using a 0.05 level of
significance, the critical value is found to be 11.
We should reject the null hypothesis if the test statistic T is

less than or equal to the critical value.
Because the test statistic of T = 11 equals the critical value,

we reject the null hypothesis.
We conclude that the taxi-out and taxi-in times do not

appear to be about the same.
SIS 1037Y(1) 2020 -2021 53

Claims about the Median of a Single
Population
Make one simple adjustment:
When testing a claim about the median of a

single population, create matched pairs by
pairing each sample value with the claimed
value of the median. The preceding
procedure can then be used.
SIS 1037Y(1) 2020 -2021 54

 Sign Test
Pairs
Independent Samples
SIS 1037Y(1) 2020 -2021 55

Key Concept
The Wilcoxon rank-sum test uses ranks of values
from two independent samples to test the null
hypothesis that the two populations have equal
medians.
The basic idea underlying the Wilcoxon rank-sum
test is this: If two samples are drawn from
identical populations and the individual values are
all ranked as one combined collection of values,
then the high and low ranks should fall evenly
between the two samples.
If the low ranks are found predominantly in one
sample and the high ranks are found
predominantly in the other sample, we suspect
that the two populations have different medians.
SIS 1037Y(1) 2020 -2021 56
Caution
Do not confuse the Wilcoxon rank-
sum test for two independent
samples with the Wilcoxon signed-
ranks test for matched pairs.
SIS 1037Y(1) 2020 -2021 57

Definition
The Wilcoxon rank-sum test is a nonparametric
test that uses ranks of sample data from two
independent populations.
It is used to test the null hypothesis that the
two independent samples come from
populations with equal medians.
SIS 1037Y(1) 2020 -2021 58

Notation
n1 = size of Sample 1
n2 = size of Sample 2
R1 = sum of ranks for Sample 1
R = same as R1 (sum of ranks for Sample 1)
μR = mean of the sample R values that is
expected when the two populations have
equal medians
σR = standard deviation of the sample R
values that is expected when the two
populations have equal medians
SIS 1037Y(1) 2020 -2021 59
1. There are two independent simple random
samples.
2. Each of the two samples has more than 10

values.
Note: There is no requirement that the two

populations have a normal distribution or any
other particular distribution.
SIS 1037Y(1) 2020 -2021 60

R  R
z
R
n1 (n1  n2  1)
where R 
2
n1n2 (n1  n2  1)
R 
12
n1 = size of the sample from which the rank sum R is found
n2 = size of the other sample
R = sum of ranks of the sample with size n1
SIS 1037Y(1) 2020 -2021 61

P-Values can be found using the z test statistic
and a normal dist table.
Critical values can be found in stats table. The

test statistic is based on the normal distribution.
SIS 1037Y(1) 2020 -2021 62

1. Temporarily combine the two samples into one
big sample, then replace each sample value
with its rank.
2. Find the sum of the ranks for either one of the

two samples.
3. Calculate the value of the z test statistic, where

either sample can be used as “Sample 1.”
SIS 1037Y(1) 2020 -2021 63

The following table
lists pulse rates of
samples of males
and females (from
Data Set 1).
Use a 0.05
significance level to
test the claim that
males and females
have the same
median pulse rate.
SIS 1037Y(1) 2020 -2021 64
Requirement Check: The sample
data are two independent random
samples, and the sample sizes are 12
and 11, which both exceed 10.
The hypotheses are:

H 0 : The median pulse rate of males is equal to the median pulse rate of females.
H1 : The median pulse rates are different.
SIS 1037Y(1) 2020 -2021 65

Rank the combined 23 pulse rates – refer to the
table in one of the previous slide.
If we choose the pulse rates of males as Sample 1,

we get:
R  4.5  11  19   6  123.5
Also, n1 = 12, n2 = 11, and we can find the values of

μR, σR, and the test statistic z.
SIS 1037Y(1) 2020 -2021 66

Example - Continued
n1 (n1  n2  1) 12(12  11  1)
R    144
2 2
n1n2 (n1  n2  1) (12)(11)(12  11  1)

R    16.248
12 12
R  R 123.5  144
z   1.26
R 16.248
SIS 1037Y(1) 2020 -2021 67

Since we have a two-tailed test with α = 0.05, the
critical values are ±1.96.
The test statistic of z = –1.26 does not fall in the

critical region, so we fail to reject the null
hypothesis.
There is not sufficient evidence to warrant the

rejection of the claim that males and females have
the same median pulse rate.
Based on the available sample data, it appears males

and females have pulse rates with the same median.
SIS 1037Y(1) 2020 -2021 68

 Sign Test
Pairs
Independent Samples
SIS 1037Y(1) 2020 -2021 69

Key Concept
This section introduces the Kruskal-
Wallis test, which uses ranks of data
from three or more independent
samples to test the null hypothesis that
the samples come from populations
with equal medians.
This test is the complement to ANOVA,
but it does not require normal
distributions.
SIS 1037Y(1) 2020 -2021 70
 We compute the test statistic H, which has a
distribution that can be approximated by the
chi-square distribution as long as each sample
has at least 5 observations.
 When we use the chi-square distribution in this

context, the number of degrees of freedom is k
– 1, where k is the number of samples.
 The H test statistic is basically a measure of the

variance of the rank sums R1, R2, ..., Rk.
SIS 1037Y(1) 2020 -2021 71

N = total number of observations in all observations
combined
k = number of samples
N1 = number of observations in Sample 1
For Sample 2, the sum of ranks is R2 and the number
of observations is n2, and similar notation is used for
the other samples.
SIS 1037Y(1) 2020 -2021 72

1. We have at least three independent
random samples.
2. Each sample has at least 5 observations.
Note: There is no requirement that the

populations have a normal distribution or
any other particular distribution.
SIS 1037Y(1) 2020 -2021 73

Kruskal-Wallis Test
12  2
R1 R2 2
Rk 
2
H    ...    3( N  1)
N ( N  1)  n1 n2 nk 
Critical Values
1. Test is right-tailed.
2. df = k – 1 (Because the test statistic H can be
approximated by the chi-square distribution, use
stats table ).
P-values are often found using technology
SIS 1037Y(1) 2020 -2021 74

1. Temporarily combine all samples into one big
sample and assign a rank to each sample value.
2. For each sample, find the sum of the ranks and
find the sample size.
3. Calculate H by using the results of Step 2 and
the notation and test statistic.
SIS 1037Y(1) 2020 -2021 75

The given table lists
IQ scores from a
sample of subjects
with low, medium,
and high lead
exposure. Use a
0.05 level of
significance to test
the claim that the
three sample
medians come from
populations with
medians that are all
equal.
SIS 1037Y(1) 2020 -2021 76
Requirement Check: Each of the three
samples is a simple random sample
and each sample size is at least 5.
The hypotheses are:

H 0 : The median IQ score is the same for each of the three populations.
H1 : The three populations have medians IQ scores that are not all the same.
SIS 1037Y(1) 2020 -2021 77

We first rank the data, as noted in the given table.
The test statistic is:
12  R1 R2 R3 
2 2 2
H      3( N  1)
N ( N  1)  n1 n2 n3 
12  862 50.52 53.52 
      3(19  1)
19(19  1)  8 6 5 
 0.694
SIS 1037Y(1) 2020 -2021 78

Because each sample has at least five
observations, the distribution of H is
approximately chi-square with k – 1
degrees of freedom (3 – 1 = 2 df).
Refer to chi-square table to find the critical
value of 5.991.
As shown on the next slide, the test statistic
of H = 0.694 does not fall in the rejection
region, so we fail to reject the null
hypothesis of equal population medians.
SIS 1037Y(1) 2020 -2021 79

 Example - cont
There is not sufficient evidence to reject the claim that IQ

scores from subjects with low, medium, and high levels of
lead exposure all have the same median.
SIS 1037Y(1) 2020 -2021 80

 Sign Test
Pairs
Independent Samples
SIS 1037Y(1) 2020 -2021 81

Key Concept
This section describes the nonparametric method
of the rank correlation test, which uses paired
data to test for an association between two
variables.
Earlier we used paired sample data to compute

values for the linear correlation coefficient r, but
in this section we use ranks as a the basis for
computing the rank correlation coefficient rs.
SIS 1037Y(1) 2020 -2021 82

The rank correlation test (or
Spearman’s rank correlation test) is a
non-parametric test that uses ranks of
sample data consisting of matched
pairs.
It is used to test for an association
between two variables.
SIS 1037Y(1) 2020 -2021 83

Advantages
1. With rank correlation, we can analyze paired data
that are ranks or can be converted to ranks. This
method does not require a normal distribution for
any population.
2. Rank correlation can be used to detect some (not
all) relationships that are not linear.
SIS 1037Y(1) 2020 -2021 84

Compute the rank correlation
coefficient rs and use it to test for an
association between two variables.
Then we can test the following:
H 0 :  s  0 (There is no correlation between the two variables.)

H1 :  s  0 (There is a correlation between the two variables.)
SIS 1037Y(1) 2020 -2021 85

rs = rank correlation coefficient for sample
paired data (rs is a sample statistic)
ρs = rank correlation coefficient for all the
population data (ρs is a population
parameter)
n = number of pairs of sample data
d = difference between ranks for the two
values within an individual pair
SIS 1037Y(1) 2020 -2021 86

The paired data are a simple random sample and
the data are ranks or can be converted to ranks.
Note: Unlike the parametric methods seen earlier,

there is no requirement that the sample pairs of
data have a bivariate normal distribution. There is
no requirement of a normal distribution for any
population.
SIS 1037Y(1) 2020 -2021 87

Test Statistic
First convert the data to ranks. Then calculate:
nxy  (x)(y )
rs 
n(x )  (x)
2 2
n(y )(y ) 2 2
No ties: After converting the data in each sample to

ranks, if there are no ties among ranks for either
variable, the exact value of the test statistic can be
calculated using this formula: 6d 2
rs  1 
n(n 2  1)
SIS 1037Y(1) 2020 -2021 88

P-Values and Critical Values
P-values: Sometimes found using technology,

provided the technology used has this function.
Critical values:
 If n ≤ 30, critical values are found in stats table
(Critical Values of Spearman’s Rank Correlation
Coefficient rs.)
 If n > 30, use this formula, where the value of z
corresponds to the significance level (for example
α = 0.05, z = 1.96):
z
rs 
n 1
SIS 1037Y(1) 2020 -2021 89

A not very serious disadvantage of rank correlation is
its efficiency rating of 0.91.
This efficiency rating shows that with all other

circumstances being equal, the nonparametric
approach of rank correlation requires 100 pairs of
sample data to achieve the same results as only 91
pairs of sample observations analyzed through
parametric methods, assuming that the stricter
requirements of the parametric approach are met.
SIS 1037Y(1) 2020 -2021 90

SIS 1037Y(1) 2020 -2021 91
Rank Correlation Testing
SIS 1037Y(1) 2020 -2021 92

Example
The following table lists quality rankings and
prices of 37-inch LCD televisions. Find the
value of the rank correlation coefficient and
use it to determine whether or not there is a
correlation between quality and price.
Use a 0.05 significance level. Based on the

result, does it appear that you can get better
quality by spending more?
SIS 1037Y(1) 2020 -2021 93

Example - Continued
Requirement Check: We assume we
have a simple random sample.
The hypotheses are:

H 0 :  s  0 (There is no correlation between quality and price.)
H1 :  s  0 (There is a correlation between quality and price.)
On the following slide, we convert to

ranks.
SIS 1037Y(1) 2020 -2021 94

Example
We have no ties, so the exact value of the test

statistic can be calculated as shown on the
next slide:
SIS 1037Y(1) 2020 -2021 95

Example - Continued
Test Statistic:
6d 2
6  32 
rs  1   1
n(n  1) 7  7 2  1
2
192
 1  0.429
336
We refer to stats table to find the critical
values of ±0.786.
Because the test statistic rs = 0.429 is

between the critical values, we fail to reject
the null hypothesis.
SIS 1037Y(1) 2020 -2021 96
Example - Continued
There is not sufficient evidence to
support a claim of a correlation between
quality and price.
Based on the given sample data, it

appears you don’t necessarily get better
quality by paying more.
SIS 1037Y(1) 2020 -2021 97

Example
Rank correlation methods sometimes allow us to detect
relationships that we cannot detect with the methods
seen earlier.
Below is a scatterplot that shows an S-shaped pattern:
SIS 1037Y(1) 2020 -2021 98

Example - Continued
The earlier methods give r = 0.590 and critical values
of ±0.632, suggesting that there is not a linear
relationship between x and y.
If we use the methods of this section, we get rs = 1 and

critical values of ±0.648, suggesting that there is a
correlation between x and y.
With rank correlations, we can sometimes detect

relationships that are not linear.
SIS 1037Y(1) 2020 -2021 99

 Sign Test
Pairs
Independent Samples
SIS 1037Y(1) 2020 -2021 100

Key Concept
This section introduces the runs test for
randomness, which can be used to determine
whether the sample data in a sequence are in
a random order.
This test is based on sample data that have
two characteristics, and it analyzes runs of
those characteristics to determine whether
the runs appear to result from some random
process, or whether the runs suggest that the
order of the data is not random.
SIS 1037Y(1) 2020 -2021 101

A run is a sequence of data having the same
characteristic; the sequence is preceded and
followed by data with a different characteristic
or by no data at all.
The runs test uses the number of runs in a

sequence of sample data to test for
randomness in the order of the data.
SIS 1037Y(1) 2020 -2021 102

Reject randomness if the number of runs is
very low or very high.
Example: The sequence of genders

FFFFFMMMMM is not random because it has
only 2 runs, so the number of runs is very low.
Example: The sequence of genders

FMFMFMFMFM is not random because there are
10 runs, which is very high.
SIS 1037Y(1) 2020 -2021 103
The runs test for randomness is based on the
order in which the data occur; it is not based
on the frequency of the data.
For example, a sequence of 3 men and 20

women might appear to be random, but the
issue of whether 3 men and 20 women
constitute a biased sample (with
disproportionately more women) is not
addressed by the runs test.
SIS 1037Y(1) 2020 -2021 104

Apply the runs test for randomness to a sequence of
sample data to test for randomness in the order of
the data.
Use the following null and alternative hypotheses:
H 0 : The data are in a random sequence.

H1 : The data are in a sequence that is not random.
SIS 1037Y(1) 2020 -2021 105

n1 = number of elements in the sequence
that have one particular characteristic
(The characteristic chosen for n1 is
arbitrary.)
n2 = number of elements in the sequence

that have the other characteristic
G = number of runs
SIS 1037Y(1) 2020 -2021 106

1. The sample data are arranged
according to some ordering scheme,
such as the order in which the
sample values were obtained.
2. Each data value can be categorized

into one of two separate categories
(such as male/female).
SIS 1037Y(1) 2020 -2021 107

For Small Samples and α = 0.05:
If n1 ≤ 20 and n2 ≤ 20, the test statistic is G and
the critical values are found using stats table
(Critical Values for Number of Runs G).
Decision Criterion: Reject randomness if the
number of runs G is:
• less than or equal to the smaller critical
value found in Runs Table
• greater than or equal to the larger critical
value found in Runs Table.
SIS 1037Y(1) 2020 -2021 108

For Large Samples or α ≠ 0.05:
If n1 > 20 or n2 > 20 or α ≠ 0.05:
Test Statistic: G  G
z
G
2n1n2
where G  1
n1  n2
(2n1n2 )(2n1n2  n1  n2 )
and G 
(n1  n2 ) (n1  n2  1)
2
SIS 1037Y(1) 2020 -2021 109

For Large Samples or α ≠ 0.05:
If n1 > 20 or n2 > 20 or α ≠ 0.05:
Critical values of z: Use Normal table.
Decision Criterion: Reject randomness if

the test statistic z is less than or equal to
the negative critical z score or greater than
or equal to the positive critical z score.
SIS 1037Y(1) 2020 -2021 110

SIS 1037Y(1) 2020 -2021 111
Runs Test for Randomness
SIS 1037Y(1) 2020 -2021 112

Listed below are some recent winners of the NBA
basketball championship game. Let W denote a
winner from the Western Conference, E for the
Eastern Conference.
Use a 0.05 significance level to test for

randomness in the sequence:
E E W W W W W E W E W E W W W
SIS 1037Y(1) 2020 -2021 113

Requirement Check: The data are arranged in
order, and each data value is categorized into one
of two separate categories.
We must find the values of n1, n2, and G.
n1 = number of Eastern Conference Winners = 5

n2 = number of Western Conference Winners = 10
G = number of runs = 8
SIS 1037Y(1) 2020 -2021 114
Because n1 ≤ 20 and n2 ≤ 20 and α = 0.05,
the test statistic is G = 8.
From stats table, the critical values are 3
and 12.
Because G = 8 is neither less than or equal
to 3 nor greater than or equal to 12, we do
not reject randomness.
There is not sufficient evidence to reject
randomness in the sequence of winners.
SIS 1037Y(1) 2020 -2021 115

Data Set 3 lists data from 107 study
subjects. Let us consider the
sequence of listed genders indicated
below.
Test the claim that the sequence is

random using a 0.05 significance
level.
SIS 1037Y(1) 2020 -2021 116

Requirement Check: The data are arranged in order,
and each data value is categorized into one of two
separate categories (male / female).
We must find the values of n1, n2, and G.
Examination of the sequence of 107 genders gives:

n1 = number of males = 92
n2 = number of females = 15
G = number of runs = 25
SIS 1037Y(1) 2020 -2021 117

Since n1 > 20, we need to calculate the test statistic
G:
2n1n2 2  92 15 
G  1   1  26.7944
n1  n2 92  15
(2n1n2 )(2n1n2  n1  n2 )
G 
(n1  n2 ) 2 (n1  n2  1)
 2  92 15  2  92 15  92  15

  2.45633
 92  15  92  15  1
2
SIS 1037Y(1) 2020 -2021 118

Since n1 > 20, we need to calculate the test
statistic G:
G  G 25  26.7944
z   0.73
G 2.45633
Because the significance level is 0.05, the

critical values are z = ±1.96.
The test statistic does not fall within the
critical regions, so we fail to reject the null
hypothesis of randomness.
The given sequence appears to be random.
SIS 1037Y(1) 2020 -2021 119
SIS 1037Y(1) 2020 -2021 120
Comments?

Week11 Non Parametric

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week11 Non Parametric

Uploaded by

Copyright:

Available Formats

Lecture 16

SIS 1037Y(1) 2020 -2021 2

SIS 1037Y(1) 2020 -2021 3

SIS 1037Y(1) 2020 -2021 4

2. Unlike parametric methods, nonparametric

2. Nonparametric tests are not as efficient as

SIS 1037Y(1) 2020 -2021 6

SIS 1037Y(1) 2020 -2021 7

SIS 1037Y(1) 2020 -2021 8

SIS 1037Y(1) 2020 -2021 9

SIS 1037Y(1) 2020 -2021 10

SIS 1037Y(1) 2020 -2021 11

SIS 1037Y(1) 2020 -2021 12

SIS 1037Y(1) 2020 -2021 13

Note: There is no requirement that the sample

SIS 1037Y(1) 2020 -2021 14

n = the total number of positive and

SIS 1037Y(1) 2020 -2021 15

For n > 25: Test statistic is

SIS 1037Y(1) 2020 -2021 16

For n ≤ 25, critical x values are in Critical value for

For n > 25, critical z values are in Normal dist Table .

SIS 1037Y(1) 2020 -2021 17

SIS 1037Y(1) 2020 -2021 18

Consider the claim that the XSORT method

Using common sense, we see that with a sample

The sample proportion contradicts the alternative

SIS 1037Y(1) 2020 -2021 19

The sample results must be significant in the

SIS 1037Y(1) 2020 -2021 20

SIS 1037Y(1) 2020 -2021 21

SIS 1037Y(1) 2020 -2021 22

SIS 1037Y(1) 2020 -2021 23

2. Record only the sign of the difference found in

SIS 1037Y(1) 2020 -2021 24

SIS 1037Y(1) 2020 -2021 25

Use the sign test to test the claim that there is

SIS 1037Y(1) 2020 -2021 26

We have 8 positive signs, 3 negative signs, and

The sign test tells us whether or not the

SIS 1037Y(1) 2020 -2021 27

H 0 : There is no difference. (The median of the differences is equal to 0.)

We let n = 11 and x = 3. We should proceed to

From stats table, the critical value of 1 is found

SIS 1037Y(1) 2020 -2021 28

We conclude that there is not sufficient evidence to

There does not appear to be a difference.

SIS 1037Y(1) 2020 -2021 29

Then we can test claims about the

SIS 1037Y(1) 2020 -2021 30

SIS 1037Y(1) 2020 -2021 31

Let p denote the population proportion of baby girls.

The claim is that the XSORT method increases the

SIS 1037Y(1) 2020 -2021 32

The test statistic is

SIS 1037Y(1) 2020 -2021 33

SIS 1037Y(1) 2020 -2021 34

SIS 1037Y(1) 2020 -2021 35

SIS 1037Y(1) 2020 -2021 36

Use the 106 temperatures listed for 12 A.M. on Day 2

Of the 106 subjects, 68 had temperatures below

SIS 1037Y(1) 2020 -2021 37

H 0 : Median is equal to 98.6F.