You are on page 1of 121

Lecture 16

Nonparametric Statistics
 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 2


In the preceding lectures, we have seen a
variety of different methods of inferential
statistics.
Many of those methods require normally
distributed populations and are based on
sampling from a population with specific
parameters, such as the mean μ, standard
deviation σ, or population proportion p.

SIS 1037Y(1) 2020 -2021 3


Definitions
Parametric tests have requirements about the nature
or shape of the populations involved.
Nonparametric tests do not require that samples
come from populations with normal distributions or
have any other particular distributions.
Consequently, nonparametric tests are called
distribution-free tests.

SIS 1037Y(1) 2020 -2021 4


1. Nonparametric methods can be applied to a wide
variety of situations because they do not have
the more rigid requirements of the corresponding
parametric methods. In particular, nonparametric
methods do not require normally distributed
populations.

2. Unlike parametric methods, nonparametric


methods can often be applied to categorical
data, such as the genders of survey
respondents.
SIS 1037Y(1) 2020 -2021 5
1. Nonparametric methods tend to waste
information because exact numerical data
are often reduced to a qualitative form.

2. Nonparametric tests are not as efficient as


parametric tests, so with a nonparametric
test we generally need stronger evidence
(such as a larger sample or greater
differences) in order to reject a null
hypothesis.

SIS 1037Y(1) 2020 -2021 6


Example: All things being equal, nonparametric rank correlation
requires 100 sample observations to achieve the same results as
91 sample observations analyzed through parametric linear
correlation, assuming the stricter requirements for using the
parametric test are met.

SIS 1037Y(1) 2020 -2021 7


Data are sorted when they are arranged
according to some criterion, such as smallest to
the largest or best to worst.
A rank is a number assigned to an individual
sample item according to its order in the sorted
list. The first item is assigned a rank of 1, the
second is assigned a rank of 2, and so on.

SIS 1037Y(1) 2020 -2021 8


Find the mean of the ranks involved and assign
this mean rank to each of the tied items.

SIS 1037Y(1) 2020 -2021 9


 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 10


Key Concept
The main objective of this section is to understand
the sign test procedure, which involves converting
data values to plus and minus signs, then testing for
disproportionately more of either sign.

SIS 1037Y(1) 2020 -2021 11


The sign test is a nonparametric (distribution-
free) test that uses plus and minus signs to test
different claims, including:
1.Claims involving matched pairs of sample
data
2. Claims involving nominal data
3.Claims about the median of a single
population

SIS 1037Y(1) 2020 -2021 12


The basic idea underlying the sign test is to
analyze the frequencies of the plus and
minus signs to determine whether they are
significantly different.
For consistency, we will use a test statistic
based on the number of time the less
frequent sign occurs.

SIS 1037Y(1) 2020 -2021 13


The sample data have been randomly selected.

Note: There is no requirement that the sample


data come from a population with a particular
distribution, such as a normal distribution.

SIS 1037Y(1) 2020 -2021 14


x = the number of times the less frequent
sign occurs

n = the total number of positive and


negative signs combined

SIS 1037Y(1) 2020 -2021 15


For n ≤ 25: Test statistic is x = the number of
times the less frequent sign occurs

For n > 25: Test statistic is


n
( x  0.5) 
z 2
n
2

SIS 1037Y(1) 2020 -2021 16


P-Values: P-values are often provided by technology,
or can be often found using the z test
statistic.

Critical Values:

For n ≤ 25, critical x values are in Critical value for


sign test Table.

For n > 25, critical z values are in Normal dist Table .

SIS 1037Y(1) 2020 -2021 17


When applying the sign test in a one-tailed
test, we need to be very careful to avoid
making the wrong conclusion when one sign
occurs significantly more often than the other,
but the sample data contradict the alternative
hypothesis. See the following example.

SIS 1037Y(1) 2020 -2021 18


Among 945 couples who used the XSORT method
of gender selection, 66 had boys, so the sample
proportion of boys is 66 / 945 or 0.0698.

Consider the claim that the XSORT method


increases the likelihood of baby boys so that the
probability of a boy is p > 0.5. This claim
becomes the alternative hypothesis.

Using common sense, we see that with a sample


proportion of boys of 0.0698, we can never
support a claim that p > 0.5.

The sample proportion contradicts the alternative


hypothesis because it is not greater than 0.5.

SIS 1037Y(1) 2020 -2021 19


We must be careful to avoid making the
fundamental mistake of thinking that a claim
is supported because the sample results are
significant.

The sample results must be significant in the


same direction as the alternative hypothesis.

SIS 1037Y(1) 2020 -2021 20


Sign Test Procedure

SIS 1037Y(1) 2020 -2021 21


Sign Test Procedure

Sign Test
Normal

SIS 1037Y(1) 2020 -2021 22


Sign Test Procedure

Sign Test
Normal

SIS 1037Y(1) 2020 -2021 23


When using the sign test with data that are
matched pairs, we convert the raw data to plus
and minus signs as follows:
1. Subtract each value of the second variable
from the corresponding value of the first
variable.

2. Record only the sign of the difference found in


step 1.
Exclude ties: that is, any matched pairs in
which both values are equal.

SIS 1037Y(1) 2020 -2021 24


If the two sets of data have equal medians, the
number of positive signs should be approximately
equal to the number of negative signs.

SIS 1037Y(1) 2020 -2021 25


Example
The following table includes taxi-out times and
taxi-in times for a sample of American Airlines
Flight 21.

Use the sign test to test the claim that there is


no difference between taxi-out and taxi-in
times. Use a 0.05 significance level.

SIS 1037Y(1) 2020 -2021 26


Example - Continued
Requirement Check: We only need the sample
data to be a simple random sample, and that
requirement is satisfied.

We have 8 positive signs, 3 negative signs, and


1 difference of 0.

The sign test tells us whether or not the


numbers of positive and negative signs are
approximately equal.

SIS 1037Y(1) 2020 -2021 27


Example - Continued
The hypotheses are:

H 0 : There is no difference. (The median of the differences is equal to 0.)


H1 : There is a difference. (The median of the differences is not equal to 0.)

We let n = 11 and x = 3. We should proceed to


find the critical value from stats table.

From stats table, the critical value of 1 is found


for n = 11 and, α = 0.05 in two tails.

SIS 1037Y(1) 2020 -2021 28


Example - Continued
Since n ≤ 25, the test statistic is x = 3, and with a
critical value of 1, we fail to reject the null hypothesis
of no difference.

We conclude that there is not sufficient evidence to


warrant rejection of the claim of no difference between
taxi-out times and taxi-in times.

There does not appear to be a difference.

SIS 1037Y(1) 2020 -2021 29


Claims Involving Nominal Data
The nature of nominal data limits the
calculations that are possible, but we can
identify the proportion of the sample data that
belong to a particular category.

Then we can test claims about the


corresponding population proportion p.

SIS 1037Y(1) 2020 -2021 30


Example
879 of 945 babies born to parents using
the XSORT method of gender selection
were girls.
Use the sign test and a 0.05 level of
significance to test the claim that this
method of gender selection is effective
in increasing the likelihood of a baby
girl.

SIS 1037Y(1) 2020 -2021 31


Example - Continued
Requirement Check: The only requirement is that the
sample be a simple random sample, and based on the
design of the experiment, we can assume so.

Let p denote the population proportion of baby girls.

The claim is that the XSORT method increases the


likelihood of having a girl, so the hypotheses are:

H 0 : p  0.5
H1 : p  0.5

SIS 1037Y(1) 2020 -2021 32


Example - Continued
Denoting girls by (+) and boys by (), we have 879
positive signs and 66 negative signs.

The test statistic is


n
( x  0.5)   
z  2
n
2
 945 
(66  0.5)   
  2 
 26.41
945
2

SIS 1037Y(1) 2020 -2021 33


Example - Continued
With a test statistic of z = –26.41 and a critical value
of z = –1.645, we can reject the null hypothesis.

SIS 1037Y(1) 2020 -2021 34


Example - Continued
The XSORT method of gender selection does appear to
be associated with an increase in the likelihood of a
girl.

SIS 1037Y(1) 2020 -2021 35


Claims About the Median
of a Single Population
The negative and positive signs are
based on the claimed value of the
median. See the next example.

SIS 1037Y(1) 2020 -2021 36


Example
Data Set 3 in includes measured body temperatures
of adults.

Use the 106 temperatures listed for 12 A.M. on Day 2


with the sign test to test the claim that the median is
less than 98.6ºF.

Of the 106 subjects, 68 had temperatures below


98.6ºF, 23 had temperatures above 98.6ºF, and 15
had temperatures equal to 98.6ºF.

SIS 1037Y(1) 2020 -2021 37


Example - Continued
We test the claim:

H 0 : Median is equal to 98.6F.


H1 : Median is less than 98.6F.

We discard the 15 data values of 98.6 since they result


in differences of zero.

We have 68 negative signs and 23 positive signs, so n


= 91.

The value of n exceeds 25, so we obtain the test


statistic:

SIS 1037Y(1) 2020 -2021 38


Example - Continued

n
( x  0.5)   
z  2
n
2
 91 
(23  0.5)   
  2
 4.61
91
2

SIS 1037Y(1) 2020 -2021 39


Example - Continued
In this one-tailed test with α = 0.05, we can reject the
null hypothesis:

There is sufficient evidence to support the claim that


the median body temperature of healthy adults is less
than 98.6ºF. SIS 1037Y(1) 2020 -2021 40
 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 41


Key Concept

The Wilcoxon signed-ranks test involves the


conversion of the sample data to ranks.
This test can be used for two different applications:

1.Testing a claim that a population of matched pairs


has the property that the matched pairs have
differences with a median equal to zero.
2.Testing a claim that a single population of
individual values has a median equal to some
claimed value.

SIS 1037Y(1) 2020 -2021 42


1. The data are a simple random sample.
2. The population of differences has a
distribution that is approximately
symmetric, meaning that the left half of
its histogram is roughly a mirror image
of its right half.
(There is no requirement that the data
have a normal distribution.)

SIS 1037Y(1) 2020 -2021 43


T = the smaller of the following two sums:
1. The sum of the positive ranks of the nonzero
differences d
2. The absolute value of the sum of the negative
ranks of the nonzero differences d

SIS 1037Y(1) 2020 -2021 44


For n ≤ 30, the test statistic is T.

For n > 30, the test statistic is:

n(n  1)
T
z 4
n(n  1)(2n  1)
24

SIS 1037Y(1) 2020 -2021 45


P-values: P-values are often provided by
technology, or can be found using the z test
statistic and a table.

Critical Values:
1. For n ≤ 30, the critical T value is found in a
Wilcoxon Table.
2. For n > 30, the critical z values are found in
a Normal Distribution Table.

SIS 1037Y(1) 2020 -2021 46


Step 1: For each pair of data, find the difference d
by subtracting the second value from the
first. Keep the signs, but discard any pairs
for which d = 0.
Step 2: Ignore the signs of the differences, then
sort the differences from lowest to highest
and replace the differences by the
corresponding rank value. When
differences have the same numerical value,
assign to them the mean of the ranks
involved in the tie.

SIS 1037Y(1) 2020 -2021 47


Step 3: Attach to each rank the sign difference from
which it came. That is, insert those signs that
were ignored in step 2.
Step 4: Find the sum of the absolute values of the
negative ranks. Also find the sum of the
positive ranks.
Step 5: Let T be the smaller of the two sums found in
Step 4. Either sum could be used, but for a
simplified procedure we arbitrarily select the
smaller of the two sums.

SIS 1037Y(1) 2020 -2021 48


Step 6: Let n be the number of pairs of data for
which the difference d is not 0.
Step 7: Determine the test statistic and critical
values based on the sample size, as
shown above.
Step 8: When forming the conclusion, reject the
null hypothesis if the sample data lead
to a test statistic that is in the critical
region that is, the test statistic is less
than or equal to the critical value(s).
Otherwise, fail to reject the null
hypothesis.
SIS 1037Y(1) 2020 -2021 49
Example

The first two rows of the following table include taxi


times for a sample of American Airlines Flight 21. Use
the sample data to test the claim that there is no
difference between taxi-out and taxi-in times. Use
the Wilcoxon signed-ranks test and a 0.05 level of
significance.

SIS 1037Y(1) 2020 -2021 50


Example - Continued

Requirement Check: The data are from a simple


random sample. The differences should be symmetric,
and though the histogram does not support this, we
have only 11 differences and the issue is not too
extreme.

SIS 1037Y(1) 2020 -2021 51


Example - Continued

The claim is of no difference between


taxi-in and taxi-out times, so the
hypotheses are:
H 0 : There is no difference. (The median of the differences is 0.)
H1 : There is a difference. (The median of the differences is not equal to 0.)

Using the 8-step procedure described earlier,


the test statistic is T = 11.

SIS 1037Y(1) 2020 -2021 52


Example - Continued

The sample size is n = 11, so the critical value is


found in from our table. Using a 0.05 level of
significance, the critical value is found to be 11.

We should reject the null hypothesis if the test statistic T is


less than or equal to the critical value.

Because the test statistic of T = 11 equals the critical value,


we reject the null hypothesis.

We conclude that the taxi-out and taxi-in times do not


appear to be about the same.

SIS 1037Y(1) 2020 -2021 53


Claims about the Median of a Single
Population

Make one simple adjustment:

When testing a claim about the median of a


single population, create matched pairs by
pairing each sample value with the claimed
value of the median. The preceding
procedure can then be used.

SIS 1037Y(1) 2020 -2021 54


 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 55


Key Concept
The Wilcoxon rank-sum test uses ranks of values
from two independent samples to test the null
hypothesis that the two populations have equal
medians.
The basic idea underlying the Wilcoxon rank-sum
test is this: If two samples are drawn from
identical populations and the individual values are
all ranked as one combined collection of values,
then the high and low ranks should fall evenly
between the two samples.
If the low ranks are found predominantly in one
sample and the high ranks are found
predominantly in the other sample, we suspect
that the two populations have different medians.
SIS 1037Y(1) 2020 -2021 56
Caution
Do not confuse the Wilcoxon rank-
sum test for two independent
samples with the Wilcoxon signed-
ranks test for matched pairs.

SIS 1037Y(1) 2020 -2021 57


Definition
The Wilcoxon rank-sum test is a nonparametric
test that uses ranks of sample data from two
independent populations.
It is used to test the null hypothesis that the
two independent samples come from
populations with equal medians.

SIS 1037Y(1) 2020 -2021 58


Notation
n1 = size of Sample 1
n2 = size of Sample 2
R1 = sum of ranks for Sample 1
R2 = sum of ranks for Sample 2
R = same as R1 (sum of ranks for Sample 1)
μR = mean of the sample R values that is
expected when the two populations have
equal medians
σR = standard deviation of the sample R
values that is expected when the two
populations have equal medians
SIS 1037Y(1) 2020 -2021 59
1. There are two independent simple random
samples.

2. Each of the two samples has more than 10


values.

Note: There is no requirement that the two


populations have a normal distribution or any
other particular distribution.

SIS 1037Y(1) 2020 -2021 60


R  R
z
R

n1 (n1  n2  1)
where R 
2
n1n2 (n1  n2  1)
R 
12
n1 = size of the sample from which the rank sum R is found
n2 = size of the other sample
R = sum of ranks of the sample with size n1

SIS 1037Y(1) 2020 -2021 61


P-Values can be found using the z test statistic
and a normal dist table.

Critical values can be found in stats table. The


test statistic is based on the normal distribution.

SIS 1037Y(1) 2020 -2021 62


1. Temporarily combine the two samples into one
big sample, then replace each sample value
with its rank.

2. Find the sum of the ranks for either one of the


two samples.

3. Calculate the value of the z test statistic, where


either sample can be used as “Sample 1.”

SIS 1037Y(1) 2020 -2021 63


The following table
lists pulse rates of
samples of males
and females (from
Data Set 1).

Use a 0.05
significance level to
test the claim that
males and females
have the same
median pulse rate.
SIS 1037Y(1) 2020 -2021 64
Requirement Check: The sample
data are two independent random
samples, and the sample sizes are 12
and 11, which both exceed 10.

The hypotheses are:


H 0 : The median pulse rate of males is equal to the median pulse rate of females.
H1 : The median pulse rates are different.

SIS 1037Y(1) 2020 -2021 65


Rank the combined 23 pulse rates – refer to the
table in one of the previous slide.

If we choose the pulse rates of males as Sample 1,


we get:
R  4.5  11  19   6  123.5

Also, n1 = 12, n2 = 11, and we can find the values of


μR, σR, and the test statistic z.

SIS 1037Y(1) 2020 -2021 66


Example - Continued

n1 (n1  n2  1) 12(12  11  1)
R    144
2 2

n1n2 (n1  n2  1) (12)(11)(12  11  1)


R    16.248
12 12

R  R 123.5  144
z   1.26
R 16.248

SIS 1037Y(1) 2020 -2021 67


Since we have a two-tailed test with α = 0.05, the
critical values are ±1.96.

The test statistic of z = –1.26 does not fall in the


critical region, so we fail to reject the null
hypothesis.

There is not sufficient evidence to warrant the


rejection of the claim that males and females have
the same median pulse rate.

Based on the available sample data, it appears males


and females have pulse rates with the same median.

SIS 1037Y(1) 2020 -2021 68


 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 69


Key Concept
This section introduces the Kruskal-
Wallis test, which uses ranks of data
from three or more independent
samples to test the null hypothesis that
the samples come from populations
with equal medians.
This test is the complement to ANOVA,
but it does not require normal
distributions.
SIS 1037Y(1) 2020 -2021 70
 We compute the test statistic H, which has a
distribution that can be approximated by the
chi-square distribution as long as each sample
has at least 5 observations.

 When we use the chi-square distribution in this


context, the number of degrees of freedom is k
– 1, where k is the number of samples.

 The H test statistic is basically a measure of the


variance of the rank sums R1, R2, ..., Rk.

SIS 1037Y(1) 2020 -2021 71


N = total number of observations in all observations
combined
k = number of samples
R1 = sum of ranks for Sample 1
N1 = number of observations in Sample 1
For Sample 2, the sum of ranks is R2 and the number
of observations is n2, and similar notation is used for
the other samples.

SIS 1037Y(1) 2020 -2021 72


1. We have at least three independent
random samples.

2. Each sample has at least 5 observations.

Note: There is no requirement that the


populations have a normal distribution or
any other particular distribution.

SIS 1037Y(1) 2020 -2021 73


Kruskal-Wallis Test

12  2
R1 R2 2
Rk 
2
H    ...    3( N  1)
N ( N  1)  n1 n2 nk 

Critical Values
1. Test is right-tailed.
2. df = k – 1 (Because the test statistic H can be
approximated by the chi-square distribution, use
stats table ).
P-values are often found using technology

SIS 1037Y(1) 2020 -2021 74


1. Temporarily combine all samples into one big
sample and assign a rank to each sample value.
2. For each sample, find the sum of the ranks and
find the sample size.
3. Calculate H by using the results of Step 2 and
the notation and test statistic.

SIS 1037Y(1) 2020 -2021 75


The given table lists
IQ scores from a
sample of subjects
with low, medium,
and high lead
exposure. Use a
0.05 level of
significance to test
the claim that the
three sample
medians come from
populations with
medians that are all
equal.
SIS 1037Y(1) 2020 -2021 76
Requirement Check: Each of the three
samples is a simple random sample
and each sample size is at least 5.

The hypotheses are:


H 0 : The median IQ score is the same for each of the three populations.
H1 : The three populations have medians IQ scores that are not all the same.

SIS 1037Y(1) 2020 -2021 77


We first rank the data, as noted in the given table.
The test statistic is:

12  R1 R2 R3 
2 2 2
H      3( N  1)
N ( N  1)  n1 n2 n3 
12  862 50.52 53.52 
      3(19  1)
19(19  1)  8 6 5 
 0.694

SIS 1037Y(1) 2020 -2021 78


Because each sample has at least five
observations, the distribution of H is
approximately chi-square with k – 1
degrees of freedom (3 – 1 = 2 df).
Refer to chi-square table to find the critical
value of 5.991.
As shown on the next slide, the test statistic
of H = 0.694 does not fall in the rejection
region, so we fail to reject the null
hypothesis of equal population medians.

SIS 1037Y(1) 2020 -2021 79


 Example - cont

There is not sufficient evidence to reject the claim that IQ


scores from subjects with low, medium, and high levels of
lead exposure all have the same median.

SIS 1037Y(1) 2020 -2021 80


 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 81


Key Concept
This section describes the nonparametric method
of the rank correlation test, which uses paired
data to test for an association between two
variables.

Earlier we used paired sample data to compute


values for the linear correlation coefficient r, but
in this section we use ranks as a the basis for
computing the rank correlation coefficient rs.

SIS 1037Y(1) 2020 -2021 82


The rank correlation test (or
Spearman’s rank correlation test) is a
non-parametric test that uses ranks of
sample data consisting of matched
pairs.
It is used to test for an association
between two variables.

SIS 1037Y(1) 2020 -2021 83


Advantages
1. With rank correlation, we can analyze paired data
that are ranks or can be converted to ranks. This
method does not require a normal distribution for
any population.
2. Rank correlation can be used to detect some (not
all) relationships that are not linear.

SIS 1037Y(1) 2020 -2021 84


Compute the rank correlation
coefficient rs and use it to test for an
association between two variables.
Then we can test the following:

H 0 :  s  0 (There is no correlation between the two variables.)


H1 :  s  0 (There is a correlation between the two variables.)

SIS 1037Y(1) 2020 -2021 85


rs = rank correlation coefficient for sample
paired data (rs is a sample statistic)
ρs = rank correlation coefficient for all the
population data (ρs is a population
parameter)
n = number of pairs of sample data
d = difference between ranks for the two
values within an individual pair

SIS 1037Y(1) 2020 -2021 86


The paired data are a simple random sample and
the data are ranks or can be converted to ranks.

Note: Unlike the parametric methods seen earlier,


there is no requirement that the sample pairs of
data have a bivariate normal distribution. There is
no requirement of a normal distribution for any
population.

SIS 1037Y(1) 2020 -2021 87


Test Statistic
First convert the data to ranks. Then calculate:

nxy  (x)(y )
rs 
n(x )  (x)
2 2
n(y )(y ) 2 2

No ties: After converting the data in each sample to


ranks, if there are no ties among ranks for either
variable, the exact value of the test statistic can be
calculated using this formula: 6d 2
rs  1 
n(n 2  1)

SIS 1037Y(1) 2020 -2021 88


P-Values and Critical Values

P-values: Sometimes found using technology,


provided the technology used has this function.
Critical values:
 If n ≤ 30, critical values are found in stats table
(Critical Values of Spearman’s Rank Correlation
Coefficient rs.)
 If n > 30, use this formula, where the value of z
corresponds to the significance level (for example
α = 0.05, z = 1.96):
z
rs 
n 1

SIS 1037Y(1) 2020 -2021 89


A not very serious disadvantage of rank correlation is
its efficiency rating of 0.91.

This efficiency rating shows that with all other


circumstances being equal, the nonparametric
approach of rank correlation requires 100 pairs of
sample data to achieve the same results as only 91
pairs of sample observations analyzed through
parametric methods, assuming that the stricter
requirements of the parametric approach are met.

SIS 1037Y(1) 2020 -2021 90


SIS 1037Y(1) 2020 -2021 91
Rank Correlation Testing

SIS 1037Y(1) 2020 -2021 92


Example
The following table lists quality rankings and
prices of 37-inch LCD televisions. Find the
value of the rank correlation coefficient and
use it to determine whether or not there is a
correlation between quality and price.

Use a 0.05 significance level. Based on the


result, does it appear that you can get better
quality by spending more?

SIS 1037Y(1) 2020 -2021 93


Example - Continued
Requirement Check: We assume we
have a simple random sample.

The hypotheses are:


H 0 :  s  0 (There is no correlation between quality and price.)
H1 :  s  0 (There is a correlation between quality and price.)

On the following slide, we convert to


ranks.

SIS 1037Y(1) 2020 -2021 94


Example

We have no ties, so the exact value of the test


statistic can be calculated as shown on the
next slide:

SIS 1037Y(1) 2020 -2021 95


Example - Continued
Test Statistic:
6d 2
6  32 
rs  1   1
n(n  1) 7  7 2  1
2

192
 1  0.429
336
We refer to stats table to find the critical
values of ±0.786.

Because the test statistic rs = 0.429 is


between the critical values, we fail to reject
the null hypothesis.
SIS 1037Y(1) 2020 -2021 96
Example - Continued
There is not sufficient evidence to
support a claim of a correlation between
quality and price.

Based on the given sample data, it


appears you don’t necessarily get better
quality by paying more.

SIS 1037Y(1) 2020 -2021 97


Example
Rank correlation methods sometimes allow us to detect
relationships that we cannot detect with the methods
seen earlier.

Below is a scatterplot that shows an S-shaped pattern:

SIS 1037Y(1) 2020 -2021 98


Example - Continued
The earlier methods give r = 0.590 and critical values
of ±0.632, suggesting that there is not a linear
relationship between x and y.

If we use the methods of this section, we get rs = 1 and


critical values of ±0.648, suggesting that there is a
correlation between x and y.

With rank correlations, we can sometimes detect


relationships that are not linear.

SIS 1037Y(1) 2020 -2021 99


 Review and Preview
 Sign Test
 Wilcoxon Signed-Ranks Test for Matched
Pairs
 Wilcoxon Rank-Sum Test for Two
Independent Samples
 Kruskal-Wallis Test
 Rank Correlation
 Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 100


Key Concept
This section introduces the runs test for
randomness, which can be used to determine
whether the sample data in a sequence are in
a random order.
This test is based on sample data that have
two characteristics, and it analyzes runs of
those characteristics to determine whether
the runs appear to result from some random
process, or whether the runs suggest that the
order of the data is not random.

SIS 1037Y(1) 2020 -2021 101


A run is a sequence of data having the same
characteristic; the sequence is preceded and
followed by data with a different characteristic
or by no data at all.

The runs test uses the number of runs in a


sequence of sample data to test for
randomness in the order of the data.

SIS 1037Y(1) 2020 -2021 102


Reject randomness if the number of runs is
very low or very high.

Example: The sequence of genders


FFFFFMMMMM is not random because it has
only 2 runs, so the number of runs is very low.

Example: The sequence of genders


FMFMFMFMFM is not random because there are
10 runs, which is very high.
SIS 1037Y(1) 2020 -2021 103
The runs test for randomness is based on the
order in which the data occur; it is not based
on the frequency of the data.

For example, a sequence of 3 men and 20


women might appear to be random, but the
issue of whether 3 men and 20 women
constitute a biased sample (with
disproportionately more women) is not
addressed by the runs test.

SIS 1037Y(1) 2020 -2021 104


Apply the runs test for randomness to a sequence of
sample data to test for randomness in the order of
the data.
Use the following null and alternative hypotheses:

H 0 : The data are in a random sequence.


H1 : The data are in a sequence that is not random.

SIS 1037Y(1) 2020 -2021 105


n1 = number of elements in the sequence
that have one particular characteristic
(The characteristic chosen for n1 is
arbitrary.)

n2 = number of elements in the sequence


that have the other characteristic

G = number of runs

SIS 1037Y(1) 2020 -2021 106


1. The sample data are arranged
according to some ordering scheme,
such as the order in which the
sample values were obtained.

2. Each data value can be categorized


into one of two separate categories
(such as male/female).

SIS 1037Y(1) 2020 -2021 107


For Small Samples and α = 0.05:
If n1 ≤ 20 and n2 ≤ 20, the test statistic is G and
the critical values are found using stats table
(Critical Values for Number of Runs G).
Decision Criterion: Reject randomness if the
number of runs G is:
• less than or equal to the smaller critical
value found in Runs Table
• greater than or equal to the larger critical
value found in Runs Table.

SIS 1037Y(1) 2020 -2021 108


For Large Samples or α ≠ 0.05:
If n1 > 20 or n2 > 20 or α ≠ 0.05:
Test Statistic: G  G
z
G
2n1n2
where G  1
n1  n2
(2n1n2 )(2n1n2  n1  n2 )
and G 
(n1  n2 ) (n1  n2  1)
2

SIS 1037Y(1) 2020 -2021 109


For Large Samples or α ≠ 0.05:
If n1 > 20 or n2 > 20 or α ≠ 0.05:

Critical values of z: Use Normal table.

Decision Criterion: Reject randomness if


the test statistic z is less than or equal to
the negative critical z score or greater than
or equal to the positive critical z score.

SIS 1037Y(1) 2020 -2021 110


SIS 1037Y(1) 2020 -2021 111
Runs Test for Randomness

SIS 1037Y(1) 2020 -2021 112


Listed below are some recent winners of the NBA
basketball championship game. Let W denote a
winner from the Western Conference, E for the
Eastern Conference.

Use a 0.05 significance level to test for


randomness in the sequence:

E E W W W W W E W E W E W W W

SIS 1037Y(1) 2020 -2021 113


Requirement Check: The data are arranged in
order, and each data value is categorized into one
of two separate categories.
We must find the values of n1, n2, and G.

n1 = number of Eastern Conference Winners = 5


n2 = number of Western Conference Winners = 10
G = number of runs = 8
SIS 1037Y(1) 2020 -2021 114
Because n1 ≤ 20 and n2 ≤ 20 and α = 0.05,
the test statistic is G = 8.
From stats table, the critical values are 3
and 12.
Because G = 8 is neither less than or equal
to 3 nor greater than or equal to 12, we do
not reject randomness.
There is not sufficient evidence to reject
randomness in the sequence of winners.

SIS 1037Y(1) 2020 -2021 115


Data Set 3 lists data from 107 study
subjects. Let us consider the
sequence of listed genders indicated
below.

Test the claim that the sequence is


random using a 0.05 significance
level.

SIS 1037Y(1) 2020 -2021 116


Requirement Check: The data are arranged in order,
and each data value is categorized into one of two
separate categories (male / female).
We must find the values of n1, n2, and G.

Examination of the sequence of 107 genders gives:


n1 = number of males = 92
n2 = number of females = 15
G = number of runs = 25

SIS 1037Y(1) 2020 -2021 117


Since n1 > 20, we need to calculate the test statistic
G:
2n1n2 2  92 15 
G  1   1  26.7944
n1  n2 92  15

(2n1n2 )(2n1n2  n1  n2 )
G 
(n1  n2 ) 2 (n1  n2  1)

 2  92 15  2  92 15  92  15


  2.45633
 92  15  92  15  1
2

SIS 1037Y(1) 2020 -2021 118


Since n1 > 20, we need to calculate the test
statistic G:
G  G 25  26.7944
z   0.73
G 2.45633

Because the significance level is 0.05, the


critical values are z = ±1.96.
The test statistic does not fall within the
critical regions, so we fail to reject the null
hypothesis of randomness.
The given sequence appears to be random.
SIS 1037Y(1) 2020 -2021 119
SIS 1037Y(1) 2020 -2021 120
Comments?

You might also like