457
3S Nonparametric Statistics
(have the same shape when plotted) under the null hypothesis. In the Mann-
Whitney test and the two-sample t test, the actual probability of rejecting the
null hypothesis when it is true depends on the ratio of the variances of the two
groups (Pratt, 1964).
Example 38.1 The Mann-Whitney (Wilcoxon) rank-sum test is a nonparametric analog of the
two-sample t test for independent samples. The Mann-Whitney statistic, which
The Mann-Whitney is also reported by 3D, is computed whenever the Kruskal-Wallis test is request-
rank-sum test ed for data with two g:roups. The Kruskal-Wallis test for more than two groups
is discussed in Example 35.2. These statistics are explained in Appendix B.IS.
In Example 35.1 we analyze the Exercise data described in Chapter 2 and
Example 3D.2. The data are stored on disk in a file named EXERCISE.DAT. We
test whether PULSE_2, pulse rate after exercise, differs significantly between
smokers and nonsmokers. The data file has a case whose PULSE_2 is erroneous
(265 instead of 165). The impact of this outlier is lessened when we use this
method, which is based on ranks and not on exact values.
The INPUT, VARIABLE, and GROUP paragraphs in Input 35.1 are common to all
BMDP programs (see Chapters 3, 4, and 5). The FILE command tells the program
where to find the data and is used for systems like IBM PC and VAX. (For IBM
mainframes, see UNIT, Chapter 3.)
A GROUPING variable must be specified for the Mann-Whitney test. In this
example we specify our grouping variable (SMOKE) in the GROUP paragraph,
and then use CODES and NAMES to identify the values of the variable. The TEST
paragraph is required in 35; here we use it to request the Kruskal-Wallis test
(KRUSKAL). As noted above, we will also get results for the Mann-Whitney test.
458
Nonparametric Statistics 3S
Output 38.1
[1] 35 reads 40 cases. Only complete cases are used in the computations; i.e.,
cases that have no values missing or out of range. All variables are checked
for acceptable values unless you specify a USE list in the VARIABLE para-
graph, in which case only the variables in the USE list are checked. 5ee
Example 35.2 for how to use all available data for each test request.
[2] 35 prints descriptive statistics for each variable except the designated
LABEL variable:
mean
standard deviation
minimum observed value (not out of range)
maximum observed value (not out of range)
[3] For each variable specified, 35 reports the sample size (frequency) and sum
of ranks by subgroup, the Kruskal-Wallis test statistic, and the level of sig-
nificance. If VARIABLE is not specified in the TEST paragraph, then the
results are shown for all variables other than the LABEL and GROUPING
variables. Here there is no significant difference in post-exercise pulse val-
ues between smokers and nonsmokers.
[4] When there are two groups, 35 computes the Mann-Whitney (Wilcoxon)
rank-sum test statistic and its level of Significance, which coincides with
that of the Kruskal-Wallis test statistic. See Appendix B.18 for more about
significance levels.
Example 38.2 We use the Werner blood chemistry data (Appendix D) to illustrate the
Kruskal-Wallis statistic for more than two groups. We are testing whether
The Kruskal-Wallis cholesterol values for women in four different age groups come from identical
test and multiple populations. To classify the data into four AGE groups, we use the GROUP para-
graph with CODES and CUTPOINTS specified.
comparisons
459
3S Nonparametric Statistics
Output3S.2
[1] 35 reads 188 cases. In this example, 35 eliminates cases only when data
needed for the particular test (i.e., AGE and CHOLSTRL) are missing or out
of range. If NO DELCASE were omitted, seven cases would be eliminated
460
Nonparametric Statistics 3S
Example 38.3 We use 35 to repeat the SIGN test and Wilcoxon signed-rank test performed in
Example 3D.5 for the Exercise data. The hypothesis for both tests is that there is
The sign test and no difference between matched variables or paired observations. In Input 35.3,
Wilcoxon signed- the matched variables are PULSE_l and PULSE_2, pulse rate before and after
exercise. We test whether PULSE_l differs significantly from PULSE_2. 35 auto-
rank test matically converts differences between PULSE_l and PULSE_2 into ranks for the
Wilcoxon test.
Output 38.3
461
3S Nonparametric Statistics
[1] 3S computes the sign test for each pair of variables in the TEST paragraph
VARIABLES list. We look at the flagged values. For each case, the total num-
ber of nonzero differences between paired PULSE_l and PULSE_2 values (40
here) is printed in the first panel of results for the sign test. The n~ber of
positive differences appears in the second panel. Here all 40 cases showed
an increase in pulse rate after exercise, so there are no positive differences.
The third panel reports the level of significance of the sign test correspond-
ing to a two-sided test of the hypothesis that the + and - signs of the differ-
ences are equally probable (each sign has probability 0.5). See Appendix
B.18 for more information on significance levels.
[2] 3S computes the Wilcoxon signed-rank-test for each pair of variables. The
first panel of results lists the number of nonzero differences. The second
panel gives the value of the smaller of the sum of ranks for positive differ-
ences and the sum of ranks for negative differences. In the third panel 3S
reports the level of significance of the Wilcoxon signed rank test for a two-
sided test of the hypothesis that the populations have the same location
parameter. See Appendix B.I8.
Example 3S.4 We analyze corrected data from Siegel (1956, p. 233; see Data Set 35.1), using
Friedman's two-way analysis of variance and the Kendall coefficient of concor-
Friedman's two-way dance. The Friedman test is an extension of the sign test to more than two
analysis of variance matched or paired variables. This arrangement of data is known as a random-
ized block design. The rows are the blocks, and the columns are the treatments.
and Kendall's coeffi- Blocks are formed using matched samples or repeated measures (as here). The
cient of concordance null hypothesis is that of no treatment differences (the alternative hypotheses
relate to differences in location).
In this example, the data in each row are the relative ranks (from 1 to 20)
assigned by staff psychologists and speech therapists to 20 mothers based on
effectiveness of child rearing. If the data were scores, 3S would convert them to
ranks. We test whether there is no difference among the ranks of the mothers.
462
Nonparametric Statistics 3S
Output 3S.4
[1] For each case 3S ranks the observations or scores for each variable (moth-
er). A case corresponds to a judge or test. For each variable 3S prints the
sum of the ranks. Since variable names were not included in the input, the
variables are labeled X(1) through X(20).
[2] 3S next reports the value of the Friedman test statistic and its level of signif-
icance. Here the Friedman statistic is significant, suggesting consistent dif-
ferences in child rearing effectiveness between mothers. We could use
COMPARE to determine which pairs of mothers differ significantly. See
Appendix B.I8.
[3] The Kendall coefficient of concordance is a normalization of the Friedman
statistic and has the same level of significance. The Kendall coefficient can
range from 0 to 1.
Example 3S.5 The Kendall and Spearman correlations estimate the association between two
variables based on the ranks of the observations. They are appropriate for data
Kendall and Spearman whose observations can be ranked, whether or not an exact numerical value
rank correlations can be assigned. The two correlations are equally powerful, but are scaled dif-
ferently. The Spearman correlation coefficient and level of significance for
matched data are also provided by 3D (see Example 3D.5). When the variables
463
3S Nonparametric Statistics
are categorical you may use 4F, which also computes standard errors for the
correlations.
We use the Werner blood chemistry data (Appendix D) to illustrate the Kendall
and Spearman rank correlations.
Output 38.5
[1] 3S reads 188 cases, of which 180 are complete. All correlations are based on
those 180 pairs. To use all data for each pair, state NO DELCASE. If there is a
considerable amount of missing data, see program AM, which has several
options for analyzing incomplete data.
[2] 3S calculates the Kendall rank correlation coefficients for all possible pairs
of variables and reports the coefficients in matrix format.
[3] The Spearman rank correlation coefficients are printed in the same format
as the Kendall statistics.
464
Nonparametric Statistics 3S
Special Features
Using all available data: When NO DELCASE is in effect, each test eliminates only cases with missing or
NO DELCASE out of range values for variables needed for that test. When you are performing
tests on a number of variables, NO DELCASE maximizes the number of cases
available for each test, but this means that all tests may not be based on the
same cases. This differs from a VARIABLE USE list: a USE list includes all vari-
ables being tested in a problem, and only uses cases with acceptable values for
all these variables. If NO DELCASE is used with KENDALL and SPEARMAN cor-
relations, the correlations may be based on varying numbers and combinations
of pairs of variables, reducing your ability to compare levels of correlation.
3S I INPut
Commands The INPUT paragraph is required for the first problem in each run. It is
described in detail in Chapter 3. An additional command for 35 is DELCASE.
DELCASE. NO DELCASE.
State NO DELCASE if you want to use all available data for each variable tested.
By default, 35 uses only cases complete for all variables (no values are missing
or outside any specified range limits).
I GROUP
See Chapter 5 for a description of GROUP commands.
New Syntax VARiable =variable. VAR = BRTHPILL.
Required when KRUSKAL is specified in the TEST paragraph. State the name or
number of a variable used to classify the cases into groups. If you prefer, you
can still specify a grouping variable with the GROUPING command in the VARI-
ABLE paragraph as described in the 1990 BMDP Manual. If the grouping variable
takes on more than ten distinct values or codes, CODES or CUTPOINTS must be
specified in the GROUP paragraph (see Chapter 5).
I TEST
The TEST paragraph is required to specify the statistics to compute. It may be
repeated after END for additional analyses of the same data.
WiLcoxon. - Wilcoxon signed-rank test
KRUskal. - Kruskal-Wallis one-way analysis of variance and Mann-
Whitney rank-sum test
SIGN. - Sign test
FRIEDman. - Friedman's two-way analysis of variance and Kendall's coeffi-
cient of concordance
KENDall. - Kendall rank correlation, 't'b
465
3S Nonparametric Statistics
default, 35 uses all variables except the GROUP and LABEL variables.
COMPare. COMPo
Use COMPARE with KRUSKAL or FRIEDMAN to obtain multiple comparisons
for the Kruskal-Wallis and Friedman tests. 35 will compare every possible pair
of groups.
TITLE='texf'. TITLE = 'PRE VERSUS POST'.
Specify a title to print at the top of each output page. By default, no title is
printed.
I INPut
NODELCASE. DELCASE. 35.1
... I GROUP
... VARiable =variable. no grouping variable; 35.1
.
...
/ TE5T
VARiables =list.
required for KRUSKAL
466