P. 1
Nonparametric Statistical Inference, Fourth Edition

# Nonparametric Statistical Inference, Fourth Edition

|Views: 1,472|Likes:

### Availability:

See more
See less

07/01/2013

pdf

text

original

In this chapter we have studied in detail the nonparametric coefﬁ-
cients that were proposed by Kendall and Spearman to measure
association. Both coefﬁcients can be computed for a sample from a
bivariate distribution, a sample of pairs, when the data are numerical
measurements or ranks indicating relative magnitudes. The absolute
values of both coefﬁcients range between zero and one, with increasing
values indicating increasing degrees of association. The sign of the
coefﬁcient indicates the direction of the association, direct or inverse.
The values of the coefﬁcients are not directly comparable, however. We
know that jRj5jTj for any set of data, and in fact jRj can be as much
as 50 percent greater than jTj.
Both coefﬁcients can be used to test the null hypothesis of in-
dependence between the variables. Even though the magnitudes of R
and T are not directly comparable, the magnitudes of the P values
based on them should be about the same, allowing for the fact that
they are measuring association in different ways. The interpretation
of T is easier than for R. T is the proportion of concordant pairs in the
sample minus the proportion of discordant pairs. T can also be inter-
preted as a coefﬁcient of disarray. The easiest interpretation of R is as
the sample value of the Pearson product-moment correlation coefﬁ-
cient calculated using the ranks of the sample data.

Y

1 4 5 2 3 7 8 9 6

1 4 2 5 3 7 8 9 6

1 2 4 5 3 7 8 9 6

1 2 4 3 5 7 8 9 6

1 2 3 4 5 7 8 9 6

1 2 3 4 5 7 8 9 6

1 2 3 4 5 7 8 6 9

1 2 3 4 5 6 7 8 9

MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES

443

An exact test of the null hypothesis of independence can be
carried out using either T or R for small sample sizes. Generation of
tables for exact P values was difﬁcult initially, but now computers
have the capacity for doing this for even moderate n. For inter-
mediate and large sample sizes, the tests can be performed using
large sample approximations. The distribution of T approaches the
normal distribution much more rapidly than the distribution of R
and hence approximate P values based on R are less reliable than
those based on T.

Both T and R can be used when ties are present in either or both
samples, and both have a correction for ties that improves the normal
approximation. The correction with T always increases the value of T
while the R correction always decreases the value of R, making the
coefﬁcients closer in magnitude.
If we reject the null hypothesis of independence by either Tor R,
we can conclude that there is some kind of dependence or ‘‘association’’
between the variables. But the kind of relationship or association that
exists deﬁes any verbal description in general. The existence of a re-
lationship or signiﬁcant association does not mean that the relation-
ship is causal. The relationship may be due to several other factors, or
to no factor at all. Care should always be taken in stating the results of
an experiment that no causality is implied, either directly or indirectly.
Kendall’s T is an unbiased estimator of a parameter t in the bi-
variate population; t represents the probability of concordance minus
the probability of discordance. Concordance is not the same as corre-
lation, although both represent a kind of association. Spearman’s R is
not an unbiased estimator of the population correlation r. It is an
unbiased estimator of a parameter which is a function of t and the

The tests of independence based on T and R can be considered
nonparametric counterparts of the test that the Pearson product-
moment correlation coefﬁcient r is equal to zero in the bivariate nor-
mal distribution or that the regression coefﬁcient b equals zero. The
asymptotic relative efﬁciency of these tests relative to the parametric
test based on the sample Pearson product-moment correlation coefﬁ-
cient is 9=p2

¼0.912 for normal distributions and one for the con-

tinuous uniform distribution.
Both Tand R can be used to test for the existence of trend in a set
of time-ordered observations. The test based on T is called the Mann
test, and the test based on R is called the Daniels’ test. Both of these
tests are alternatives to the tests for randomness presented in
Chapter 3.

444

CHAPTER 11

PROBLEMS

11.1. A beauty contest has eight contestants. The two judges are each asked to rank

the contestants in a preferential order of pulchritude. The results are shown in the table.

Answer parts (a) and (b) using (i) the Kendall tau-coefﬁcient procedures and (ii) the

Spearman rank-correlation-coefﬁcient procedures:

ðaÞ Calculate the measure of association.
ðbÞ Test the null hypothesis that the judges ranked the contestants indepen-

dently (use tables).

ðcÞ Find a 95 percent conﬁdence-interval estimate of t.
11.2. Verify the result given in (4.9).

11.3. Two independent random samples of sizes m and n contain no ties. A set of mþn

paired observations can be derived from these data by arranging the combined samples

in ascending order of magnitude and (a) assigning ranks, (b) assigning sample indi-

cators. Show that Kendall’s tau, calculated for these pairs without a correction for ties, is

linearly related to the Mann-Whitney U statistic for these data, and ﬁnd the relation if

the sample indicators are (i) sample numbers 1 and 2, (ii) 1 for the ﬁrst sample and 0 for

the second sample as in the Z vector of Chapter 7.

11.4. Show that for the standardized bivariate normal distribution

0;0Þ¼ 1

4þ 1

2p arcsin r

11.5. The Census Bureau reported that Hispanics are expected to overtake blacks as

the largest minority in the United States by the year 2030. Use two different tests to see

whether there is a direct relationship between number of Hispanics and percent of state

population for the nine states below.

Contestant

Judge A B C D E F G H

1

2 1 3 5 4 8 7 6

2

1 2 4 5 7 6 8 3

State

Hispanics (millions)

Percent of state

population

California

6.6

23

Texas

4.1

24

New York

2.1

12

Florida

1.5

12

Illinois

0.8

7

Arizona

0.6

18

New Jersey

0.6

8

New Mexico

0.5

35

0.4

11

MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES

445

11.6. Company-ﬁnanced expenditures in manufacturing on research and develop-

ment (R&D) are currently about 2.7 percent of sales in Japan and 2.8 percent of

sales in the United States. However, when these ﬁgures are looked at separately

according to industry, the following data from Mansﬁeld (1989) show some large

differences.

ðaÞ Use the signed-rank test to determine whether Japan spends a larger per-
centage than the United States on R&D.
ðbÞ Determine whether there is a signiﬁcant positive relationship between
percentages spent by Japan and the United States (two different methods).

11.7. The World Almanac and Book of Facts published the following divorce rates per

1000 population in the United States. Determine whether these data show a positive

trend using four different methods.

11.8. For the time series data in Example 4.1 of Chapter 3, use the Mann test based on

Spearman’s rank correlation coefﬁcient to see if the data show a positive trend.

Industry

Japan

United States

Food

0.8

0.4

Textiles

1.2

0.5

Paper

0.7

1.3

Chemicals

3.8

4.7

Petroleum

0.4

0.7

Rubber

2.9

2.2

Ferrous metals

1.9

0.5

Nonferrous metals

1.9

1.4

Metal products

1.6

1.3

Machinery

2.7

5.8

Electrical equipment

5.1

4.8

Motor vehicles

3.0

3.2

Other transport equipment

2.6

1.2

Instruments

4.5

9.0

Year

Divorce rate

1945

3.5

1950

2.6

1955

2.3

1960

2.2

1965

2.5

1970

3.5

1975

4.8

1980

5.2

1985

5.0

446

CHAPTER 11

11.9. Do Problem 11.8 using the Daniels’ test based on Kendall’s tau.

11.10. The rainfall measured by each of 12 gauges was recorded for 20 successive days.

The average results for each day are as follows:

Use an appropriate test to determine whether these data exhibit some sort of pattern.

Find the P value:

(a) Using tests based on runs with both the exact distribution and the normal

approximation.

(b) Using other tests that you may think are appropriate.

(c) Compare and interpret the results of ðaÞ and ðbÞ.

11.11 A company has administered a screening aptitude test to 20 new employees over

a two-year period. The record of scores and date on which the person was hired are

shown below.

Assuming that these test scores are the primary criterion for hiring, do you think that

over this time period the screening procedure has changed, or the personnel agent has

changed, or supply has changed, or what? Base your answer on an appropriate non-

parametric procedure (there are several appropriate methods).

11.12. Ten randomly chosen male college students are used in an experiment to in-

vestigate the claim that physical strength is decreased by fatigue. Describe the re-

lationship for the data below, using several methods of analysis.

Day

Rainfall Day Rainfall

April 1

0.00 April 11 2.10

April 2

0.03 April 12 2.25

April 3

0.05 April 13 2.50

April 4

1.11 April 14 2.50

April 5

0.00 April 15 2.51

April 6

0.00 April 16 2.60

April 7

0.02 April 17 2.50

April 8

0.06 April 18 2.45

April 9

1.15 April 19 0.02

April 10 2.00 April 20 0.00

1=4=01 75 9=21=01 72 12=9=01 81 5=10=02 91

3=9=01 74 10=4=01 77 1=22=02 93 7=17=02 95

6=3=01 71 10=9=01 76 1=26=02 82 9=12=02 90

6=15=01 76 11=1=01 78 3=21=02 84 10=4=02 92

8=4=01 98 12=5=01 80 4=6=02 89 12=6=02 93

MEASURES OF ASSOCIATION FOR BIVARIATE SAMPLES

447

11.13. Given a single series of time-ordered ordinal observations over several years,

name some nonparametric procedures that could be used and how in order to detect

a long-term positive trend. Name as many as you can think of.

11.14. Six randomly selected mice are studied over time and scored on an ordinal basis

for intelligence and social dominance. The data are as follows:

ðaÞ Find the coefﬁcient of rank correlation.
ðbÞ Find the appropriate one-tailed P value for your result in ðaÞ.
ðcÞ Find the Kendall tau coefﬁcient.
ðdÞ Find the appropriate one-tailed P value for your result in ðcÞ.

11.15. A board of marketing executives ranked 10 similar products, and an ‘‘in-

dependent’’ group of male consumers also ranked the products. Use two different non-

parametric procedures to describe the correlation between rankings and ﬁnd a one-tailed

P value in each case. State the hypothesis and alternative and all assumptions. Compare

and contrast the procedures.

11.16. Derive the null distribution of both Kendall’s tau statistic and Spearman’s rho

for n¼3 assuming no ties.

Minutes between rest periods

Pounds lifted per minute

5.5

350

9.6

230

2.4

540

4.4

390

0.5

910

7.9

220

2.0

680

3.3

590

13.1

90

4.2

520

Mouse

Intelligence

Social dominance

1

45

63

2

26

0

3

20

16

4

40

91

5

36

25

6

23

2

Product

A B C D E F G H

I J

Executive ranks

9 4 3 7 2 1 5 8 10 6

Independent male ranks 7 6 5 9 2 3 8 5 10 1

448

CHAPTER 11

11.17. A scout for a professional baseball team ranks nine players separately in terms

of speed and power hitting, as shown below.

ðaÞ Find the rank correlation coefﬁcient and the appropriate one-tailed P value.
ðbÞ Find the Kendall tau coefﬁcient and the appropriate one-tailed P value.

11.18. Twenty-three subjects are asked to give their attitude toward elementary school

integration and their number of years of schooling completed. The data are shown below.

As a measure of the association between attitude and number of years of schooling

completed:

ðaÞ Compute Kendall’s tau with correction for ties.
ðbÞ Compute Spearman’s R with correction for ties.

Player

Speed ranking

Power-hitting ranking

A

3

1

B

1

3

C

5

4

D

6

2

E

2

6

F

7

8

G

8

9

H

4

5

I

9

7

Number of years of

school completed at

the time

Attitude toward elementary school integration

Strongly

disagree

Moderately

disagree

Moderately

agree

Strongly

agree

0–6

5

9

12

16

7–9

4

10

13

18

10–12 or G.E.D.

10

7

9

12

Some college

12

12

12

19

College degree (4 yr)

3

12

16

14

10

15