You are on page 1of 268

Fifth Edition

Handbook of Parametric
and Nonparametric
Statistical Procedures

DavidJ. Sheskin

CRC Press
Taylor & Francis Group
Boca Raton London New York

C RC Press is an imprint of the


Taylor & Francis Group an inform a business
A CHAPM AN & HALL BOOK
Fifth Edition

Handbook of Parametric
and Nonparametric
Statistical Procedures
C h a p m a n & H all/C R C
Taylor & F ran cis G roup
6 0 0 0 B roken S o u n d Parkw ay NW , S uite 300
Boca R aton, FL 33487 - 2742

© 2011 by Taylor an d F ran cis G roup, LLC


C h a p m a n & H all/C R C is a n im p rin t o f Taylor & F rancis G roup, a n In fo rm a bu sin e ss

N o claim to o rig in a l U.S. G o v ern m en t w orks

P rin te d in th e U n ited S tates of A m erica on acid - free paper


10 9 8 7 6 5 4 3 2 1

In tern atio n al S tan d ard Book N um ber: 978-1-4398-5801-1 (Hardback)

T h is b o o k c o n ta in s in fo rm a tio n o b ta in e d fro m au th e n tic an d h ig h ly reg ard ed sources. R easonable effo rts have b een
m ad e to p u b lish reliable data a n d in fo rm atio n , b u t th e a u th o r an d p u b lish er c a n n o t assu m e resp o n sib ility for th e v alid ­
ity o f all m aterials o r th e co n seq u en ces o f th e ir use. T he a u th o rs a n d p u b lish ers have a tte m p te d to tra c e th e co p y rig h t
h o ld ers o f all m a te ria l re p ro d u c ed in th is pu b licatio n an d apologize to co p y rig h t h o ld ers if p erm issio n to p u b lish in th is
fo rm h a s n o t b e e n o b tain ed . If an y co p y rig h t m a te ria l h a s n o t b e e n acknow ledged please w rite a n d let us k now so we m ay
re c tify in any fu tu re re p rin t.

E xcept as p e rm itte d u n d e r U.S. C o p y rig h t Law, no p a r t o f th is b o o k m ay be re p rin te d , rep ro d u ced , tra n s m itte d , or u ti ­
lized in an y fo rm b y an y electronic, m echanical, or o th e r m eans, now k n o w n o r h e re a fte r invented, in clu d in g p h o to co p y ­
ing, m icro film in g , an d reco rd in g , o r in an y in fo rm a tio n storage or re trie v a l system , w ith o u t w ritte n p erm issio n fro m th e
publish ers.

For p erm issio n to p h o to co p y o r u se m a terial electro n ically fro m th is w ork, please access w w w .copyright.com (h ttp ://
w w w .co p y rig h t.co m /) or c o n ta c t th e C o p y rig h t C learan ce C enter, Inc. (CCC), 222 R osew ood Drive, D anvers, M A 01923,
978-750-8400. C C C is a n o t-fo r-p ro fit o rg an izatio n th a t provides licenses a n d re g istra tio n for a v a rie ty o f u sers. For
o rg an izatio n s th a t have b e e n g ra n te d a p h o to co p y license by th e CCC, a se p a ra te sy stem of p ay m en t h a s b e e n arra n g ed .

Tradem ark Notice: P ro d u c t o r c o rp o ra te n am es m ay be tra d e m a rk s o r reg istered trad em ark s, an d are u sed only for
id en tifica tio n an d ex p lan atio n w ith o u t in te n t to infringe.
V isit the Taylor & Francis Web site at
http://w w w .taylorandfrancis.com

and the CRC P ress Web site at


http://w w w .crcpress.com
To Vicki and Emily

&

Topspin, Buffy, and Belly Button


my loyal writing companions over the years
Preface
Like the first four editions, the fifth edition of the Handbook of Parametric and
Nonparametric Statistical Procedures is designed to provide researchers, teachers, and
students with a comprehensive reference in the areas of parametric and nonparametric statistics.
The addition of material not included in the fourth edition (most notably chapters on path
analysis, structural equation modeling, a separate chapter with new material on the subject of
meta-analysis, and new material on time series analysis and statistical quality control) makes the
Handbook unparalleled in terms of its coverage of the field o f statistics. Rather than being
directed at a limited audience, the Handbook is intended for individuals who are involved in a
broad spectrum o f academic disciplines encompassing the fields o f mathematics/statistics, the
social, biological, and environmental sciences, economics/business and education. My initial
motivation for writing the Handbook of Parametric and Nonparametric Procedures was to
create a reference book on parametric and nonparametric statistical procedures that I (as well as
colleagues and students I had spoken with over the years) had always wanted, yet could never
find. To be more specific, my primary goal was to provide consumers of statistics with a
comprehensive reference book on univariate and bivariate statistical procedures (and
subsequently multivariate procedures, which were introduced in the fourth edition) that covered
a scope of material that extended far beyond what was covered in any single source. It was
essential that the book be applications oriented, yet at the same time that it address relevant
theoretical and practical issues which are of concern to the sophisticated researcher. In addition,
I wanted to write a book that is accessible to people who have a little or no knowledge of
statistics, as well as those who are well versed in the subject. Based on the feedback I have
received on the first four editions (from readers and reviewers), I believe I have achieved these
goals, and on the basis o f this I believe that the fifth edition of the Handbook of Parametric
and Nonparametric Statistical Procedures will continue to serve as an invaluable resource for
people in multiple academic disciplines who conduct research, are involved in teaching, or are
presently in the process o f learning statistics.
I am not aware of any applications-oriented book that provides in-depth coverage of as
many statistical procedures (almost 200) as the number that are covered in the Handbook of
Parametric and Nonparametric Statistical Procedures. Inspection o f the Table of Contents
and Index should confirm the scope o f material covered in the book. A unique feature of the
Handbook, which distinguishes it from other reference books on statistics, is that it provides the
reader with a practical guide that emphasizes application over theory. Although the book will
be of practical value to statistically sophisticated individuals who are involved in research, it is
also accessible to those who lack the theoretical and/or mathematical background required for
understanding the material documented in more conventional statistics reference books. Since
a major goal of the book is to serve as a practical guide, emphasis is placed on decision making
with respect to which test is most appropriate to employ in evaluating a specific design. Within
the framework of being user-friendly, clear computational guidelines, accompanied by easy-to-
understand examples, are provided for all procedures.
One should not, however, get the impression that the Handbook of Parametric and Non­
parametric Statistical Procedures is little more than a cookbook. In point of fact, the design
of the Handbook is such that within the framework of each of the statistical procedures covered,
in addition to the basic guidelines for decision making and computation, substantial in-depth
discussion is devoted to a broad spectrum of practical and theoretical issues, many o f which are
not discussed in conventional statistics books. Inclusion of the latter material ensures that the
Handbook o f Parametric and Nonparametric Statistical Procedures

Handbook will serve as an invaluable resource for those who are sophisticated as well as
unsophisticated in statistics. The Handbook of Parametric and Nonparametric Statistical
Procedures can be used as a reference book or it can be employed as a textbook in under­
graduate and graduate courses that are designed to cover a broad spectrum of parametric and/or
nonparametric statistical procedures.
It should be noted that although a major goal of this book is to provide the reader with clear,
easy to follow guidelines for conducting statistical analyses, it is essential to keep in mind that
the statistical procedures contained within it are essentially little more than algorithms that have
been derived for evaluating a set of data under certain conditions. A statistical procedure, in and
of itself, is incapable of making judgments with respect to the adequacy of the methodology
underlying an experiment and/or the reliability of data being evaluated. If either of the latter is
compromised (as a result of a faulty experimental design, sloppy methodology, use of
inappropriate and/or unreliable measuring instruments, and/or salient violation of assumptions
underlying a specific statistical procedure), for all practical purposes, the result of an analysis
will be worthless. Consequently, it cannot be emphasized too strongly that a prerequisite for the
intelligent and responsible use of statistics is that one has a reasonable understanding of the
conceptual basis behind an analysis, as well as a realization that the use of a statistical procedure
merely represents the final stage in a sequential process involved in conducting research. Those
stages which precede the statistical analysis are comprised of whatever it is a researcher does
within the framework of designing and executing a study. Ignorance or sloppiness on the part of
a researcher with respect to the latter can render any subsequent statistical analysis meaningless.
Although a number of different hypothesis testing models are discussed in this Handbook,
the primary model that is emphasized is the classical hypothesis testing model (commonly
referred to as the null hypothesis significance testing model). Although the latter model (which
has been subjected to criticism by proponents of alternative approaches to hypothesis testing) has
its limitations, the author believes the present scope of scientific knowledge would not be greater
than it is now if up to this point in time any of the alternative hypothesis testing models had been
used in its place. Throughout the book, in employing the classical hypothesis testing model, the
author emphasizes its judicious use, as well as the importance of conducting replication studies
whenever research evaluating a hypothesis is equivocal.
In order to facilitate its usage, most of the procedures contained in the Handbook are
organized within a standardized format. Specifically, for most of the procedures the following
information is provided:
I. Hypothesis Evaluated with Test and Relevant Background Information The first
part of this section provides a general statement of the hypothesis evaluated with the test. This
is followed by relevant background information on the test such as the following: a) Information
regarding the experimental design for which the test is appropriate; b) Any assumptions under­
lying the test which, if violated, would compromise its reliability; and c) General information on
other statistical procedures that are related to the test.
II. Example This section presents a description of an experiment, with an accompanying
data set (or in some instances two experiments utilizing the same data set), for which the test will
be employed. All examples (with the exception of those employed for multivariate analyses)
employ small sample sizes, as well as integer data consisting of small numbers, in order to
facilitate the reader’s ability to follow the computational procedures to be described in Section
IV.
III. Null versus Alternative Hypotheses This section contains both a symbolic and
verbal description of the statistical hypotheses evaluated with the test (i.e., the null hypothesis
versus the alternative hypothesis). It also states the form the data will assume when the null
hypothesis is supported, as opposed to when one or more of the possible alternative hypotheses
are supported.
Preface ix

IV. Test Computations This section contains a step-by-step description of the procedure
for computing the test statistic(s). The computational guidelines are clearly outlined in reference
to the data for the example(s) presented in Section II. In the case of multivariate analyses, the
statistical software package SPSS is employed to evaluate the data.
V. Interpretation of the Test Results This section describes the protocol for evaluating
the computed test statistic(s). Specifically: a) It provides clear guidelines for employing the
appropriate table(s) of critical values to analyze the test statistic(s); b) Guidelines are provided
delineating the relationship between the tabled critical values and when a researcher should retain
the null hypothesis, as opposed to when the researcher can conclude that one or more of the
possible alternative hypotheses are supported; c) The computed test statistic(s) is (are) interpreted
in reference to the example(s) presented in Section II; d) In instances where a parametric and
nonparametric test can be used to evaluate the same set of data, the results obtained using both
procedures are compared with one another, and the relative power of both tests is discussed in
this section and/or in Section VI; and e) In the case of multivariate analyses, detailed guidelines
are provided for interpreting the SPSS output displayed for an analysis.
VI. Additional Analytical Procedures for the Test and/or Related Tests Since many
of the tests described in the Handbook have additional analytical procedures associated with
them, such procedures are described in this section. Many of these procedures are commonly
employed, while others are used and/or discussed less frequently. Many of the analytical
procedures covered in Section VI are not discussed (or, if so, only briefly) in other books. Some
representative topics covered in Section VI are planned versus unplanned comparison proce­
dures, measures of association for inferential statistical tests, computation of confidence
intervals, and computation of power. In addition to the aforementioned material, for many of the
tests there is additional discussion of other statistical procedures directly related to the test under
discussion. In instances where two or more tests produce equivalent results, examples are
provided that clearly demonstrate the equivalency of the procedures.
VII. Additional Discussion of the Test Section VII discusses theoretical concepts and
issues, as well as practical and procedural issues relevant to a specific test. In some instances
where a subject is accorded brief coverage in the initial material presented on the test, the reader
is alerted to the fact that the subject is discussed in greater depth in Section VII. Many of the
topics covered in this section are accorded little or no discussion in other books. Among the
topics covered in Section VII is additional discussion of the relationship between a specific test
and other tests that are related to it. Section VII also provides bibliographic information on less
commonly employed alternative procedures that can be used to evaluate the same design for
which the test under discussion is used.
VIII. Additional Examples Illustrating the Use of the Test This section provides
descriptions of one or more additional experiments for which a specific test is applicable. For the
most part, these examples employ the same data set as that in the original example(s) presented
in Section II for the test. By virtue of using standardized data for most of the examples, the
information for a test contained in Section IV (Test computations) and Section V (Interpre­
tation of the test results) will be applicable to most of the additional examples. Because of this,
the reader is able to focus on common design elements in various experiments which indicate that
a given test is appropriate for use with a specific type of design.
IX. Addendum At the conclusion of the discussion of a number of tests an Addendum
has been included that describes one or more related tests or topics that are not discussed in
Section VI. As an example, the Addendum of the between-subjects factorial analysis of
variance contains an overview and computational guidelines for the factorial analysis of
variance for a mixed design, the analysis of variance for a Latin-square design, and the
within-subjects factorial analysis of variance.
References This section provides the reader with a comprehensive listing of primary and
secondary source material for each test.
Handbook o f Parametric and Nonparametric Statistical Procedures

Endnotes At the conclusion o f most tests, a detailed endnotes section contains additional
useful information that further clarifies or expands upon material discussed in the main text.

In addition to the Introduction and a chapter on Matrix algebra, the Handbook of


Parametric and Nonparametric Statistical Procedures contains 43 chapters (labeled as
Tests), each o f which documents a specific descriptive or inferential statistical procedure/test.
The Introduction provides the reader with a comprehensive overview of descriptive statistics
and experimental design, since prior familiarity with the latter material facilitates one’s ability
to use the book efficiently. Following the Introduction, the reader is provided with guidelines
and decision tables for selecting the appropriate statistical test for evaluating a specific experi­
mental design. Readers should take note o f the fact that the term test is employed generically for
all procedures described in the book (i.e., both inferential tests and measures of correlation/
association).
Approximately 150 pages o f new material, including three new chapters, have been added
to the fifth edition. This edition of the Handbook contains almost 200 statistical tests/procedures
that are nested in the 43 chapters noted below (the book also includes an Introduction and a
chapter on matrix algebra and multivariate analysis). Chapters in which a substantial amount of
new material has been added are indicated with an asterisk, with the subject matter o f the most
notable new material added to this edition noted in parentheses.

Introduction* (New material has been added on the average deviation, skew and kurtosis, and
index numbers employed in econometrics and business statistics.);Test l:T h e Single-Sample
z test; Test 2: The Single-Sample / Test* (An Addendum on statistical quality control has been
added to the material on this test.); Test 3: The Single-Sample Chi-Square Test for a Popu­
lation Variance; Test 4: The Single-Sample Test for Evaluating Population Skewness; Test
5: The Single-Sample Test for Evaluating Population Kurtosis; Test 6: The Wilcoxon
Signed-Ranks Test; Test 7: The Kolmogorov-Smirnov Goodness-of-Fit Test for a Single
Sample; Test 8: The Chi-Square Goodness-of-Fit Test; Test 9: The Binomial Sign Test for
a Single Sample; Test 10: The Single-Sample Runs T est (and Other Tests of Randomness);
Test 11: The t Test for Two Independent Samples* (New material has been added on the d
statistic for measuring effect size, and the material on clinical trials and tests o f equivalence has
been edited.); Test 12: The Mann-Whitney (/Test; Test 13: The Kolmogorov-Smirnov Test
for Two Independent Samples; Test 14: TheSiegel-Tukey Test for Equal Variability; Test
15: The Moses Test for Equal Variability; Test 16: The Chi-Square Test for r x c Tables
(Test 16a: The Chi-Square Test for Homogeneity; Test 16b: The Chi-Square Test of
Independence (employed with a single sample)); Test 17: The t Test for Two Dependent
Samples* (The discussion in this test of tests of equivalence has been modified.); Test 18: The
Wilcoxon Matched-Pairs Signed-Ranks Test; Test 19: The Binomial Sign Test for Two
Dependent Samples; Test 20: The McNemar Test; Test 21: The Single-Factor Between-
Subjects Analysis o f Variance; Test 22: The Kruskal-Wallis One-Way Analysis of
Variance by Ranks; Test 23: The van der Waerden Normal-Scores Test for k Independent
Samples; Test 24: The Single-Factor Within-Subjects Analysis of Variance; Test 25: The
Friedman Two-Way Analysis of Variance by Ranks; Test 26: The Cochran Q Test; Test
27: The Between-Subjects Factorial Analysis of Variance* (New material on interaction is
presented, and the analysis of random- versus fixed-effects models is discussed in detail.); Test
28: The Pearson Product-Moment Correlation Coefficient* (New material has been added
to the Addendum on data mining and time series analysis.); Test 29: Spearman’s Rank-Order
Correlation Coefficient; Test 30: Kendall’s Tau; Test 31: Kendall’s Coefficient of
Concordance; Test 32: Goodman and Kruskal’s Gamma; Matrix Algebra and Multivariate
Analysis; Test 33: Multiple Regression; Test 34: Hotelling’s T 2; Test 35: Multivariate
Analysis of Variance; Test 36: Multivariate Analysis of Covariance; Test 37: Discriminant
Preface xi

Function Analysis; Test 38: Canonical Correlation; Test 39: Logistic Regression; Test 40:
Principal Components Analysis and Factor Analysis; Test 41: Path Analysis (This is a new
chapter/test added to the fifth edition.); Test 42: Structural Equation Modeling (This is a new
chapter/test added to the fifth edition.); Test 43: Meta-Analysis (This is a new chapter/test
added to the fifth edition comprised of material that was previously in the Adden-dum of Test
28, as well as new material which includes analysis of effect size through use of inverse variance
weights.)
The author would like to express his gratitude to a number of people who helped make this
book a reality. First, I would like to thank Tim Pletscher of CRC Press for his confidence in and
support of the first edition of the Handbook. Special thanks are due to Bob Stem, who, in his role
as editor at Chapman and Hall/CRC, was responsible for the subsequent three editions. Most
recently thanks are in order to David Grubbs for soliciting and overseeing the publication of the
fifth edition. Thanks also to Mimi Williams at Taylor & Francis for overseeing production of the
manuscript. I am also indebted to Glena Ames, who did an excellent job preparing the camera-
ready manuscript for the first two editions of the book. Finally, I must express my appreciation
to my wife Vicki, who over the years has both endured and tolerated the difficulties associated
with a project of this magnitude.

David Sheskin
Table of Contents
with Summary of Topics

Introduction................................................................................................ 1
Descriptive versus inferential statistics................................................................................. 1
Statistic versus parameter ...................................................................................................... 2
Levels o f measurement .......................................................................................................... 2
Continuous versus discrete variables.....................................................................................4
Measures o f central tendency (mode, median, mean, weighted mean, geometric
mean, and the harmonic m e a n )........................................................................................... 4
Measures o f variability (range; quantiles, percentiles, quartiles, and deciles;
variance and standard deviation; the coefficient of v ariatio n )...................................... 10
Measures o f skewness and kurtosis..................................................................................... 16
Visual methods for displaying data (tables and graphs, exploratory data
analysis (stem-and-leaf displays and boxplots))..............................................................30
The normal distribution........................................................................................................ 45
Hypothesis te s tin g ................................................................................................................. 57
A history and critique o f the classical hypothesis testing m o d el...................................... 68
Estimation in inferential statistics....................................................................................... 74
Relevant concepts, issues, and terminology in conducting research (the observational
method; the experimental method; the correlational method) .................................... 76
Experimental design (pre-experimental designs; quasi-experimental designs;
true experimental designs; single-subject designs)....................................................... 83
Sampling m ethodologies...................................................................................................... 99
Basic principles o f probability............................................................................................ 101
Parametric versus nonparametric inferential statistical te s t s ...........................................109
Univariate versus bivariate versus multivariate statistical procedures ..........................110
Selection of the appropriate statistical p ro c ed u re............................................................I l l
R eferences............................................................................................................................ I l l
Endnotes ...............................................................................................................................115

Outline of Inferential Statistical Tests and Measures of


Correlation/Association ........................................................................ 131

Guidelines and Decision Tables for Selecting the Appropriate


Statistical Procedure.............................................................................. 139

Inferential Statistical Tests Employed with a Single Sam ple............ 147


Test 1: The Single-Sample z T e s t ........................................................................................149
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 149
II. Example ......................................................................................................................149
III. Null versus Alternative H ypotheses.........................................................................149
IV. Test Computations .....................................................................................................150
V. Interpretation o f the Test Results ............................................................................. 151
VI. Additional Analytical Procedures for the Single-Sample z Test and/or
xiv Handbook o f Parametric and Nonparametric Statistical Procedures

Related T e s ts ............................................................................................................... 152


VII. Additional Discussion of the Single-SamplezT e s t ................................................ 152
1. The interpretation of a negative z v a lu e ............................................................ 152
2. The standard error of the population mean and graphical
representation of the results of thesingle-sample z te s t................................... 153
3. Additional examples illustrating the interpretation of a computed z
v alu e......................................................................................................................158
4. The z test for a population proportion.............................................................. 158
VIII. Additional Examples Illustrating the Useof the Single-Sample z Test ................159
References ............................................................................................................................ 160
E ndnotes.................................................................................................................................160

Test 2: The Single-Sample t T e s t .......................................................................................163


I.Hypothesis Evaluated with Test and RelevantBackground Inform ation 163
II.Example ......................................................................................................................164
III.Null versus Alternative H ypotheses.........................................................................164
IV. Test Computations .................................................................................................... 164
V. Interpretation of the Test Results ............................................................................. 166
VI. Additional Analytical Procedures for the Single-Sample t Test and/or
Related T e sts............................................................................................................... 168
1. Determination of the power of the single-sample t test and the single­
sample z test, and the application of Test 2a: Cohen’sd in d e x .................... 168
2. Computation of a confidence interval for the mean of the population
represented by a sam ple......................................................................................179
VII. Additional Discussion of the Single-Sample t Test ...............................................189
Degrees of freed o m ............................................................................................189
VIII. Additional Examples Illustrating the Use of the Single-Sample t T e s t.................190
IX. Addendum .................................................................................................................. 191
Statistical quality control........................................................................................191
Process co n tro l................................................................................................ 192
Acceptance sampling ..................................................................................... 202
References ............................................................................................................................205
E ndnotes................................................................................................................................ 206

Test 3: The Single-Sample Chi-Square Test for a PopulationV a r ia n ce ...................211

I.Hypothesis Evaluated with Test and Relevant BackgroundInform ation 211


Example .................................................................................................................... 212
II.
III.Null versus Alternative H ypotheses ........................................................212
IV.Test Computations ...................................................................................................213
V. Interpretation of the Test Results ........................................................................... 214
VI.Additional Analytical Procedures for the Single-Sample Chi-Square
Test for a Population Variance and/or Related T e s ts .............................................216
1. Large sample normal approximation of the chi-squaredistribution ............. 216
2. Computation of a confidence interval for the variance of a population
represented by a sam ple..................................................................................... 217
3. Sources for computing the power of the single-sample chi-square test
for a population variance................................................................................... 220
VII. Additional Discussion of the Single-Sample Chi-Square Test for a
Population V a ria n c e ..................................................................................................220
VIII. Additional Examples Illustrating the Use of the Single-Sample Chi-
Table o f Contents

Square Test for a Population V ariance.................................................................... 220


References ............................................................................................................................222
Endnotes................................................................................................................................ 222

Test 4: The Single -Sample Test for Evaluating Population S k ew n ess....................... 225
I.Hypothesis Evaluated with Test and Relevant Background Inform ation 225
II.Example ..................................................................................................................... 226
III.Null versus Alternative H ypotheses........................................................................ 226
IV. Test Computations .................................................................................................... 227
V. Interpretation of the Test Results .............................................................................229
VI. Additional Analytical Procedures for the Single-Sample Test for
Evaluating Population Skewness and/or Related Tests ........................................ 230
VII. Additional Discussion of the Single-Sample Test for Evaluating
Population Skewness ................................................................................................230
1. Exact tables for the single-sample test for evaluating population
skewness ................................................................ 230
2. Note on a nonparametric test for evaluating skewness ..................................230
VIII. Additional Examples Illustrating the Use of the Single-Sample Test for
Evaluating Population Skewness ........................................................................... 231
References .......................................................................................................................... 231
Endnotes................................................................................................................................ 231

Test 5: The Single - Sample Test for Evaluating Population K u rto sis......................... 233
I.Hypothesis Evaluated with Test and Relevant Background Inform ation 233
II.Example ..................................................................................................................... 234
III.Null versus Alternative H ypotheses........................................................................ 234
IV. Test Computations .................................................................................................... 235
V. Interpretation of the Test Results .............................................................................237
VI. Additional Analytical Procedures for the Single-Sample Test for
Evaluating Population Kurtosis and/or Related T ests............................................ 238
1. Test 5a: The D’Agostino - Pearson test of norm ality ............................... 238
2. Test 5b: The Jarq u e - B era test of n o rm a lity ...............................................239
VII. Additional Discussion of the Single-Sample Test for Evaluating
Population K urtosis.................................................................................................... 240
1. Exact tables for the single-sample test for evaluating population
kurtosis................................................................................................................. 240
2. Additional comments on tests of norm ality..................................................... 240
VIII. Additional Examples Illustrating the Use of the Single-Sample Test for
Evaluating Population Kurtosis ...............................................................................241
References ............................................................................................................................241
Endnotes................................................................................................................................ 242

Test 6: The Wilcoxon Signed - Ranks T e s t .........................................................................245


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 245
II. Example ..................................................................................................................... 245
III. Null versus Alternative H ypotheses........................................................................ 246
IV. Test Computations .................................................................................................... 246
V. Interpretation of the Test Results .............................................................................248
VI. Additional Analytical Procedures for the Wilcoxon Signed-Ranks Test
xvi Handbook o f Parametric and Nonparametric Statistical Procedures

and/or Related T e s ts .................................................................................................. 250


1. The normal approximation ofthe Wilcoxon T statistic for large sample
s i z e s ..................................................................................................................... 250
2. The correction for continuity for the normal approximation of the
Wilcoxon signed-ranks t e s t ...............................................................................252
3. Tie correction for the normal approximation of the Wilcoxon test
statistic................................................................................................................. 253
4. Computation of a confidence interval fora population m edian......................254
VII. Additional Discussion of the WilcoxonSigned-Ranks T e st................................... 255
1. Power-efficiency of the Wilcoxon signed-ranks test and the concept
of asymptotic relative efficien cy ...................................................................... 255
2. Note on symmetric population concerning hypotheses regarding median
and mean .............................................................................................................256
VIII. Additional Examples Illustrating the Use of the Wilcoxon Signed-
Ranks T e s t................................................................................................................... 257
References ............................................................................................................................258
Endnotes................................................................................................................................ 258

Test 7: The Kolm ogorov-Sm irnov Goodness-of-Fit Test for a Single


S a m p le ........................................................................................................................261
I.
Hypothesis Evaluated with Test and RelevantBackground Inform ation 261
Example ..................................................................................................................... 262
II.
III.
Null versus Alternative H ypotheses........................................................................ 263
IV.Test Computations .................................................................................................... 264
V. Interpretation of the Test Results ...........................................................................268
VI.Additional Analytical Procedures for the Kolmogorov -Smirnov
Goodness-of-Fit Test for a Single Sample and/or Related Tests .........................269
1. Computing a confidence interval for the Kolmogorov -Smirnov
goodness-of-fit test for a single sam p le........................................................... 269
2. The power of the Kolmogorov-Smirnov goodness-of-fit test for
a single sam ple.................................................................................................... 270
3. Test 7a: The Lilliefors test for norm ality ...................................................270
VII. Additional Discussion of the Kolmogorov-Smirnov Goodness-of-Fit
Test for a Single S am ple........................................................................................... 272
1. Effect of sample size on the result of a goodness-of-fit test ......................... 272
2. The Kolmogorov -Smirnov goodness-of-fit test for a single sam ­
ple versus the chi-square goodness-of-fit test and alternative
goodness-of-fit te s ts ............................................................................................273
VIII. Additional Example Illustrating the Use of the Kolmogorov -Smirnov
Goodness-of-Fit Test for a Single Sample ..............................................................273
References ............................................................................................................................ 274
Endnotes................................................................................................................................ 275

Test 8: The C hi-Square Goodness-of-Fit Test ................................................................277


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 277
II. Exam ples..................................................................................................................... 278
III. Null versus Alternative H ypotheses........................................................................ 278
IV. Test Computations .................................................................................................... 279
V. Interpretation of the Test Results .............................................................................281
VI. Additional Analytical Procedures for the Chi-Square Goodness-of-Fit
Table o f Contents xvii

Test and/or Related Tests ........................................................................................ 281


1. Comparisons involving individual cells when k> 2 ...................................... 281
2. The analysis of standardized residuals..............................................................284
3. The correction for continuity for the chi-square goodness-of-fit test .......... 285
4. Computation of a confidence interval for the chi-square goodness-of-
fit test/confidence interval for a population proportion..................................286
5. Brief discussion of the z test for a population proportion
(Test 9a) and the single-sample test for the median (Test 9 b ) .......................289
6. Application of the chi-square goodness-of-fit test for assessing
goodness-of-fit for a theoretical population distribution................................289
7. Sources for computing of the power of the chi-square goodness-of-fit
t e s t ....................................................................................................................... 293
8. Heterogeneity chi-square an a ly sis....................................................................293
VII. Additional Discussion of the Chi-Square Goodness-of-Fit Test ......................... 297
1. Directionality of the chi-square goodness-of-fit t e s t ...................................... 297
2. Additional goodness-of-fit te sts ........................................................................ 299
VIII. Additional Examples Illustrating the Use of the Chi-Square Goodness-
of-Fit T e s t ................................................................................................................... 300
References ............................................................................................................................ 302
E ndnotes................................................................................................................................ 303

Test 9: The Binomial Sign Test for a Single Sample ....................................................309


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 309
II. Exam ples.................................................................................................................... 310
III. Null versus Alternative H ypotheses....................................................................... 310
IV. Test Computations ................................................................................................... 311
V. Interpretation of the Test Results ........................................................................... 313
VI. Additional Analytical Procedures for the Binomial Sign Test for a
Single Sample and/or Related T e s ts ........................................................................ 314
1. Test 9a: The z test for a population proportion (with discussion of
correction for continuity; computation of a confidence interval;
procedure for computing sample size for test of specified power;
additional comments on computation of the power of the binomial
sign test for a single sam p le).............................................................................314
2. Extension of the z test for a population proportion to evaluate the
performance of m subjects on n trials on a binomially distributed
variable ............................................................................................................... 323
3. Test 9b: The single -sample test for the m e d ia n ........................................ 325
VII. Additional Discussion of the Binomial Sign Test for a Single Sample .............. 328
1. Evaluating goodness-of-fit for a binomial distribution................................328
VIII. Additional Example Illustrating the Use of the Binomial Sign Test for
a Single Sam ple...........................................................................................................330
IX. A ddendum .................................................................................................................. 331
1. Discussion of additional discrete probability distributions and the
exponential distribution..................................................................................... 331
a. The multinomial distribution............................................................331
b. The negative binomial distribution .................................................333
c. The hypergeometric distribution ..................................................... 336
d. The Poisson distribution .................................................................. 339
Computation of a confidence intervalfor a Poisson parameter . 342
Handbook o f Parametric and Nonparametric Statistical Procedures

Test 9c: Test for com paring two Poisson c o u n t s .....................343


Evaluating goodness-of-fit for a Poisson distribution .................344
e. The exponential distribution ............................................................346
f. The matching distribution..................................................................350
2. Conditional probability, Bayes’ theorem, Bayesian statistics, and
hypothesis testing ................................................................................................352
Conditional probability ........................................................................ 352
Bayes’ theorem ..................................................................................... 353
Bayesian hypothesis testin g .................................................................. 369
Bayesian analysis of a continuous v aria b le........................................ 394
References ............................................................................................................................398
E ndnotes................................................................................................................................ 401

; 10: The Single - Sample Runs Test (and O ther Tests of R a n d o m n e ss)............... 409
I.Hypothesis Evaluated with Test and Relevant Background Inform ation 409
II.Example ....................................................................................................................410
III.Null versus Alternative H ypotheses.......................................................................411
IV. Test Computations ...................................................................................................411
V. Interpretation of the Test Results ........................................................................... 411
VI.Additional Analytical Procedures for the Single-Sample Runs Test
and/or Related T e s t s ..................................................................................................412
1. The normal approximation of the single-sample runs test for large
sample s iz e s ........................................................................................................ 412
2. The correction for continuity for the normal approximation of the
single-sample runs t e s t ....................................................................................... 413
3. Extension of the runs test to data with more than two categories ................ 414
4. Test 10a: The runs test for serial random ness .......................................... 415
VII. Additional Discussion of the Single-Sample Runs T e s t.......................................418
1. Additional discussion of the concept of randomness.................................... 418
VII. Additional Examples Illustrating the Use of the Single-Sample
Runs Test ................................................................................................................... 419
IX. A ddendum ..................................................................................................................422
1. The generation of pseudorandom numbers (The midsquare method;
the midproduct method; the linear congruential method) ............................. 422
2. Alternative tests of random ness........................................................................ 427
Test 10b: The frequency test ........................................................... 427
Test 10c: The gap test ........................................................................ 429
Test lOd: The poker t e s t .................................................................... 433
Test lOe: The maximum t e s t ..............................................................433
Test lOf: The coupon collector’s t e s t ...............................................434
Test lOg: The mean square successive difference test (for
serial randomness ...............................................................................437
Additional tests of randomness (Autocorrelation; The serial
test; The d 2 square test of random numbers; Tests o f trend
analysis/time series analysis)..............................................................439
References ............................................................................................................................440
E ndnotes................................................................................................................................ 442
Table o f Contents xix

Inferential Statistical Tests Employed with Two Independent


Samples (and Related Measures of Association/Correlation) .......... 445

Test 11: The t Test for Two Independent Samples ............................................................447


I.Hypothesis Evaluated with Test and Relevant Background Inform ation 447
II.Example ..................................................................................................................... 447
III.Null versus Alternative H ypotheses........................................................................ 448
IV. Test Computations .................................................................................................... 448
V. Interpretation of the Test Results ........................................................................... 451
VI. Additional Analytical Procedures for the t Test for Two Independent
Samples and/or Related T e s ts ...................................................................................452
1. The equation for the t test for two independent samples when a value
for a difference other than zero is stated in the null hypothesis.....................452
2. Test 11a: H artley’s F max test for homogeneity of variance/F test
for two population variances: Evaluation of the homogeneity of
variance assumption of the t test for two independent sam ples.....................454
3. Computation o f the power of the t test for two independent samples
and the application of Test l i b : Cohen’s d i n d e x ........................................ 459
4. Measures of magnitude of treatment effect for the t test for two
independent samples: Omega squared (Test 11c) and Eta squared
(Test l i d ) ...........................................................................................................466
5. Computation of a confidence interval for the t test for two independent
samples ...............................................................................................................468
6. Test li e : The z test for two independent s a m p le s ....................................470
VII. Additional Discussion of the t Test for Two Independent S am p les.....................472
1. Unequal sample siz e s ......................................................................................... 472
2. Robustness of the t test for two independent sa m p le s..................................473
3. Outliers (Procedures for identifying outliers: Box-and-whisker
plot criteria; Standard deviation score criterion; Test I lf : Me ­
dian absolute deviation test for identifying outliers; Test llg :
Extrem e Studentized deviate test for identifying outliers;
trimming data; Winsorization) and data transformation ................................474
4. Missing d a ta .......................................................................................... 488
5. Clinical t r i a l s ...................................................................................................... 491
6. Tests of equivalence: Test l l h : The W estlake - Schuirm ann test of
equivalence of two independent treatm ents (and procedure for
computing sample size in reference to the power of the test) .......................494
7. Hotelling’s T 2 .................................................................................................... 514
VIII. Additional Examples Illustrating the Use of the t Test for Two
Independent Samples ................................................................................................ 514
References ............................................................................................................................ 515
E ndnotes................................................................................................................................ 520

Test 12: The M ann - W hitney U T e s t ................................................................................... 531


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 531
II. Example ..................................................................................................................... 532
III. Null versus Alternative H ypotheses........................................................................ 532
IV. Test Computations .................................................................................................... 533
V. Interpretation of the Test Results .............................................................................536
Handbook o f Parametric and Nonparametric Statistical Procedures

VI. Additional Analytical Procedures for the Mann - Whitney (/T est and/or
Related T e s ts ............................................................................................................... 536
1. The normal approximation of the Mann -Whitney U statistic for large
sample s iz e s .........................................................................................................536
2. The correction for continuity for the normal approximation o f the
Mann - Whitney U test ....................................................................................... 538
3. Tie correction for the normal approximation of the Mann -Whitney U
statistic................................................................................................................. 539
4. Computation of a confidence interval for a difference between the
medians of two independent populations....................................................... 539
VII. Additional Discussion of the Mann -Whitney U T e s t.............................................542
1. Power-efficiency of the Mann-Whitney U t e s t ...............................................542
2. Equivalency of the normal approximation of the Mann -Whitney
U test and the t test for two independent samples
with rank o rd e rs .................................................................................................. 542
3. Alternative nonparametric rank-order procedures for evaluating a
design involving two independent sam ples..................................................... 542
VIII. Additional Examples Illustrating the Use of the Mann - Whitney U T e s t 543
IX. A ddendum ................................................................................................................. 544
1. Computer-intensive tests (Randomization and permutation tests: Test
12a: The random ization test for two independent samples; Test
12b: The bootstrap; Test 12c: The jackknife; Final comments on
computer-intensive procedures)........................................................................ 544
2. Survival analysis (Test 12d: K aplan - M eier e s tim a te )................................558
3. Procedures for evaluating censored data in a design involving two
independent samples (Perm utation test based on the median ,
G ehan’s test for censored data (Test 12e), and the log - rank
test (Test 12f)).................................................................................................... 568
References ............................................................................................................................ 584
E ndnotes................................................................................................................................ 587

Test 13: The Kolm ogorov - Sm irnov Test for Two Independent S a m p le s ..................595
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 595
II. Example .................................................................................................................... 596
III. Null versus Alternative H ypotheses....................................................................... 596
IV. Test Computations ...................................................................................................598
V. Interpretation of the Test Results ........................................................................... 600
VI. Additional Analytical Procedures for the Kolmogorov -Smirnov Test for
Two Independent Samples and/or Related Tests ...................................................601
1. Graphical method for computing the Kolmogorov -Smirnov test
statistic................................................................................................................. 601
2. Computing sample confidence intervals for the Kolmogorov -Smirnov
test for two independent sam ples...................................................................... 602
3. Large sample chi-square approximation for a one-tailed analysis of the
Kolmogorov -Smirnov test for two independent samples ............................. 602
VII. Additional Discussion of the Kolmogorov -Smirnov Test for Two
Independent Samples ................................................................................................603
1. Additional comments on the Kolmogorov -Smirnov test for two
independent samples ......................................................................................... 603
VIII. Additional Examples Illustrating the Use of the Kolmogorov -Smirnov
Table o f Contents xxi

Test for Two Independent Samples ........................................................................ 603


References ............................................................................................................................ 604
E ndnotes................................................................................................................................ 605

Test 14: The Siegel-Tukey Test for Equal Variability ................................................... 607
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 607
II. Example ..................................................................................................................... 608
III. Null versus Alternative H ypotheses........................................................................ 608
IV. Test Computations .................................................................................................... 609
V. Interpretation of the Test Results .............................................................................612
VI. Additional Analytical Procedures for the Siegel-Tukey Test for Equal
Variability and/or Related T e s ts ...............................................................................612
1. The normal approximation of the Siegel-Tukey test statistic for large
sample s iz e s ........................................................................................................ 612
2. The correction for continuity for the normal approximation of the
Siegel-Tukey test for equal variability ............................................................613
3. Tie correction for the normal approximation of the Siegel-Tukey test
statistic................................................................................................................. 614
4. Adjustment of scores for the Siegel-Tukey test for equal variability
when 0j * 02 .................................................................................................... 614
VII. Additional Discussion of the Siegel-Tukey Test for Equal V ariability 616
1. Analysis of the homogeneity of variance hypothesis for the same set
of data with both a parametric and nonparametric test, and the power-
efficiency of the Siegel-Tukey Test for Equal Variability ........................... 616
2. Alternative nonparametric tests of d ispersion .................................................617
VIII. Additional Examples Illustrating the Use of the Siegel-Tukey Test for
Equal V ariability........................................................................................................ 618
References ............................................................................................................................619
E ndnotes................................................................................................................................ 620

Test 15: The Moses Test for Equal Variability.................................................................. 623


I.Hypothesis Evaluated with Test and Relevant Background Inform ation 623
II.Example ..................................................................................................................... 624
III.Null versus Alternative H ypotheses........................................................................ 624
IV. Test Computations .................................................................................................... 626
V. Interpretation o f the Test Results .............................................................................628
VI. Additional Analytical Procedures for the Moses Test for Equal
Variability and/or Related T e s ts ...............................................................................629
1. The normal approximation of the Moses test statistic for large
sample s iz e s .........................................................................................................629
VII. Additional Discussion of the Moses Test for Equal V ariability........................... 630
1. Power-efficiency of the Moses Test for equal v ariability..............................630
2. Issue of repetitive resampling ......................................................................... 631
3. Alternative nonparametric tests of dispersion.................................................631
VIII. Additional Examples Illustrating the Use of the Moses Test for Equal
V ariability................................................................................................................... 631
References ............................................................................................................................635
Endnotes................................................................................................................................ 635
xxii Handbook o f Parametric and Nonparametric Statistical Procedures

Test 16: The C hi-Square Test for r x c Tables (Test 16a: The C hi-Square
Test for Homogeneity; Test 16b: The Chi - Square Test of Indepen ­
dence (employed with a single s a m p le ))..............................................................637
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 637
II. Exam ples..................................................................................................................... 639
III. Null versus Alternative H ypotheses........................................................................ 640
IV. Test Computations ...................................................................................................643
V. Interpretation of the Test Results .............................................................................644
VI. Additional Analytical Procedures for the Chi-Square Test for r x c
Tables and/or Related Tests ..................................................................................... 646
1. Yates’ correction for continuity........................................................................ 646
2. Quick computational equation for a 2 x 2 ta b le ...............................................647
3. Evaluation of a directional alternative hypothesis in the case of a 2 x
2 contingency table ............................................................................................648
4. Test 16c: The Fisher exact t e s t ...................................................................... 649
5. Test 16d: The z test for two independent proportions
(and computation of sample size in reference to p o w e r)................................655
6. Computation of a confidence interval for a difference between two
proportions...........................................................................................................661
7. Test 16e: The m edian test for independent sam p les..................................663
8. Extension of the chi-square test for r * c tables to contingency tables
involving more than two rows and/or columns, and associated
comparison procedures ..................................................................................... 665
9. The analysis of standardized residuals..............................................................671
10. Sources for computing the power of the chi-square test
for r x c ta b le s .................................................................................................... 673
11. Measures of association for r x c contingency ta b le s ....................................673
Test 16f: The contingency coefficient .............................................675
Test 16g: The phi coefficient..............................................................677
Test 16h: C ram er’s phi coefficient...................................................678
Test 16i: Yule’s Q ...............................................................................679
Test 16j: The odds ratio (and the concept of relative risk)
(and Test 16j-a: Test of significance for an odds ratio and
com putation of a confidence interval foran odds r a tio ) 680
Test 16k: Cohen’s kappa (and computation of a confidence
interval for kappa, Test 16k -a: Test of significance for
Cohen’s kappa, and Test 16k -b: Test of significance for
two independent values of Cohen’s k a p p a ) ..................................687
12. Combining the results of multiple 2 x 2 contingency tables: ....................... 691
Heterogeneity chi-square analysis for a 2 x 2
contingency ta b le ................................................................................... 691
Test 161: The M antel-H aenszel analysis/test (Test 161-a:
Test of homogeneity of odds ratios for M antel - H aenszel
analysis, Test 161-b: Sum m ary odds ratio for M antel-
Haenszel analysis, and Test 161-c: M antel-H aenszel test
of association) ..................................................................................... 694
VII. Additional Discussion of the Chi-Square Test for r x c Tables ........................... 706
1. Equivalency of the chi-square test for r x c tables when c = 2 with the
t test for two independent samples (when r = 2) and the single-factor
between-subjects analysis of variance (when r > 2) ...................................... 706
2. Test o f equivalence for two independent proportions: Test 16m: The
Table o f Contents xxiii

W estlake - Schuirm ann test of equivalence of two independent


proportions (and procedure for computing sample size in reference
to the power of the test) ................................................................................... 706
3. Test 16n: The log -likelihood ratio ................................................................716
4. Simpson’s paradox..............................................................................................718
5. Analysis of multidimensional contingency tables through use of
a chi-square analysis ......................................................................................... 720
6. Test 16o: Analysis of multidimensional contingency tables
with log - linear a n a ly s is ................................................................................... 731
VIII. Additional Examples Illustrating the Use of the Chi-Square Test for
r x c T ables................................................................................................................. 745
References ............................................................................................................................748
E ndnotes................................................................................................................................ 751

Inferential Statistical Tests Employed with Two Dependent


Samples (and Related Measures of Association/Correlation) .......... 761
Test 17: The t Test for Two Dependent S a m p le s ..............................................................763
I.Hypothesis Evaluated with Test and Relevant Background Inform ation 763
II.Example .................................................................................................................... 764
III.Null versus Alternative H ypotheses.......................................................................764
IV. Test Computations ...................................................................................................765
V. Interpretation of the Test Results ........................................................................... 767
VI. Additional Analytical Procedures for the t Test for Two Dependent
Samples and/or Related T e s ts ................................................................................... 768
1. Alternative equation for the t test for two dependent sam ples....................... 768
2. The equation for the t test for two dependent samples when a value for
a difference other than zero is stated inthe null hypothesis .......................... 772
3. Test 17a: The t test for homogeneity of variance for two depen ­
dent samples: Evaluation of the homogeneity of variance assumption
of the t test for two dependent sam p les............................................................772
4. Computation of the power of the t test for two dependent samples and
the application of Test 17b: Cohen’s d index ...............................................775
5. Measure o f magnitude of treatment effect for the t test for two
dependent samples: Omega squared (Test 17c) ..................... 781
6. Computation of a confidence interval for the t test for two dependent
samples .................................................................. 783
7. Test 17d: Sandler’s ,4 t e s t ...............................................................................784
8. Test 17e: The z test for two dependent sa m p le s........................................ 785
VII. Additional Discussion of the t Test for TwoDependent Samples ....................... 788
1. The use of matched subjects in a dependent samples design......................... 788
2. Relative power of the t test for two dependent samples and the t test
for two independent sam ples.............................................................................791
3. Counterbalancing and order effe c ts..................................................................792
4. Analysis of a one-group pretest-posttest design with the t test for two
dependent sam ples..............................................................................................794
5. Tests of equivalence: Test 17f: The W estlake - Schuirm ann test of
equivalence of two dependent treatm ents (and procedure for
computing sample size in reference to the power of the test) ....................... 796
xxiv Handbook o f Parametric and Nonparametric Statistical Procedures

VIII. Additional Example Illustrating the Use of the t Test for Two
Dependent Sam ples.................................................................................................... 802
References ............................................................................................................................ 803
Endnotes................................................................................................................................ 804

Test 18: The Wilcoxon Matched-Pairs Signed-Ranks T e s t.............................................809


I.Hypothesis Evaluated with Test and Relevant BackgroundInform ation 809
II.Example ..................................................................................................................... 810
III.Null versus Alternative H ypotheses........................................................................ 810
IV. Test Computations .................................................................................................... 811
V. Interpretation of the Test Results .............................................................................812
VI. Additional Analytical Procedures for the Wilcoxon Matched-Pairs
Signed-Ranks Test and/or Related Tests ................................................................814
1. The normal approximation of the Wilcoxon T statistic for large sample
s i z e s ......................................................................................................................814
2. The correction for continuity for the normal approximation of the
Wilcoxon matched-pairs signed-ranks test ..................................................... 815
3. Tie correction for the normal approximation of the Wilcoxon test
statistic................................................................................................................. 816
4. Computation of a confidence interval for a median difference between
two dependent populations ...............................................................................817
VII. Additional Discussion of the Wilcoxon Matched-Pairs Signed-Ranks
T e s t .............................................................................................................................. 819
1. Power-efficiency of the Wilcoxon matched-pairs signed-ranks t e s t 819
2. Probability of superiority as a measure of effect size .......................... 819
3. Alternative nonparametric procedures for evaluating a design
involving two dependent samples .................................................................... 819
VIII. Additional Examples Illustrating the Use of the Wilcoxon Matched-
Pairs Signed-Ranks T e s t............................................................................................820
References ............................................................................................................................ 820
Endnotes................................................................................................................................ 820

Test 19: The Binomial Sign Test for Two Dependent Sam ples...................................... 823
I.Hypothesis Evaluated with Test and Relevant BackgroundInform ation 823
II.Example ......................................................................................................................824
III.Null versus Alternative H ypotheses........................................................................ 824
IV. Test Computations .................................................................................................... 825
V. Interpretation of the Test Results .............................................................................827
VI. Additional Analytical Procedures for the Binomial Sign Test for Two
Dependent Samples and/or Related T e sts ................................................................ 828
1. The normal approximation of the binomial sign test for two
dependent samples with and without a correction for continuity .................828
2. Computation of a confidence interval for the binomial sign test for
two dependent samples ....................................................................................831
3. Sources for computing the power of the binomial sign test for two
dependent samples, and comments on the asymptotic relative effi­
ciency of the test ................................................................................................ 832
VII. Additional Discussion of the Binomial Sign Test for Two Dependent
S am ples........................................................................................................................833
1. The problem of an excessive number of zero difference scores .................833
Table o f Contents xxv

2.
Equivalency of the binomial sign test for two dependent samples and
the Friedman two-way analysis of variance by ranks when k = 2 .................833
VIII. Additional Examples Illustrating the Use of the Binomial Sign Test for
Two Dependent Samples ..........................................................................................833
References ............................................................................................................................ 833
Endnotes................................................................................................................................ 834

Test 20: The M cNem ar T e s t .................................................................................................. 835


I.Hypothesis Evaluated with Test and Relevant Background Inform ation 835
II.Exam ples......................................................................................................................836
III.Null versus Alternative H ypotheses........................................................................ 838
IV. Test Computations .................................................................................................... 840
V. Interpretation o f the Test Results .............................................................................840
VI. Additional Analytical Procedures for the McNemar Test and/or Related
Tests ............................................................................................................................ 841
1. Alternative equation for the McNemar test statistic based on the
normal distribution..............................................................................................841
2. The correction for continuity for the McNemar test ...................................... 842
3. Computation of the exact binomial probability for the McNemar test
model with a small sample s i z e ........................................................................ 843
4. Computation of the power of the McNemar t e s t .............................................845
5. Computation of a confidence interval for the McNemar te s t......................... 846
6. Computation of an odds ratio for the McNemar t e s t ...................................... 847
7. Additional analytical procedures for the McNemar test ................................848
8. Test 20a: The G art test for order e ffe c ts ..................................................... 848
VII. Additional Discussion of the McNemar T e s t ..........................................................856
1. Alternative format for the McNemar test summary table and modified
test equation.........................................................................................................856
2. The effect of disregarding matching ................................................................857
3. Alternative nonparametric procedures for evaluating a design with two
dependent samples involving categorical d a ta .................................................858
4. Test of equivalence for two independent proportions: Test 20b: The
W estlake - Schuirm ann test of equivalence of two dependent
proportions ...................................................................................................... 858
VIII. Additional Examples Illustrating the Use of the McNemar Test ......................... 868
IX. A ddendum ................................................................................................................. 870
Extension of the McNemar test model beyond 2 x 2 contingency
tables ................................................................................................................... 870
1. Test 20c: The Bowker test of internal s y m m e try ................................870
2. Test 20d: The Stuart - M axw ell test of m arginal homogeneity . . . . 874
References ............................................................................................................................ 876
E ndnotes................................................................................................................................ 878

Inferential Statistical Tests Employed with Two or More


Independent Samples (and Related Measures of
Association/Correlation).......................................................................... 883
Test 21: The Single-Factor Between-Subjects Analysis of Variance ........................... 885
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 885
xxvi Handbook o f Parametric and Nonparametric Statistical Procedures

II. Example .....................................................................................................................886


III. Null versus Alternative H ypotheses....................................................................... 886
IV. Test Computations ................................................................................................... 887
V. Interpretation o f the Test Results ............................................................................891
VI. Additional Analytical Procedures for the Single-Factor Between-
Subjects Analysis of Variance and/or Related Tests .............................................892
1. Comparisons following computation of the omnibus F value for the
single-factor between-subjects analysis of variance (Planned versus
unplanned comparisons (including simple versus complex com ­
parisons); Linear contrasts; Orthogonal comparisons; Test 21a:
Multiple t tests/Fisher’s LSD test; Test 21b: The Bonferroni-
Dunn test; Test 21c: Tukey’s HSD test; Test 21d: TheNewm an-
Keuls test; Test 21e: The Scheffe test; Test 2 If: The Dunnett
test; Additional discussion of comparison procedures and final
recommendations; The computation of a confidence interval for a
com parison).........................................................................................................892
2. Comparing the means of three or more groupswhen k>4 ............................ 923
3. Evaluation o f the homogeneity of variance assumption of the single­
factor between -subjects analysis of variance (Test 11a: Hartley’s
Fmaxtest for homogeneity of variance, Test 21g: The Levene Test
for homogeneity of variance, Test 21h: The Brown-Forsythe test
for homogeneity of variance) ........................................................................ 924
4. Computation of the power of the single-factor between-subjects
analysis of v a ria n c e ............................................................................................931
5. Measures of magnitude of treatment effect for the single-factor
between -subjects analysis of variance: Omega squared (Test 21i),
Eta squared (Test 21j), and Cohen’s/in d e x (Test 2 1 k ) .............................934
6. Computation of a confidence interval for the mean of a treatment
population ...........................................................................................................938
7. Trend analysis .................................................................................................... 940
VII. Additional Discussion of the Single-Factor Between-Subjects Analysis
of V a rian ce................................................................................................................. 952
1. Theoretical rationale underlying the single-factor between-subjects
analysis of v a ria n c e ............................................................................................952
2. Definitional equations for the single-factor between-subjects analysis
o f v aria n ce...........................................................................................................954
3. Equivalency of the single-factor between-subj ects analysis of variance
and the t test for two independent samples when k = 2 ..................................956
4. Robustness of the single-factor between-subjects analysis of
variance ............................................................................................................... 957
5. Equivalency of the single-factor between-subjects analysis of variance
and the t test for two independent samples with the chi-square test for
r x c tables when c = 2 ....................................................................................... 957
6. The general linear m o d e l................................................................................... 961
7. Fixed-effects versus random-effects models for the single-factor
between -subjects analysis of variance..............................................................962
8. Multivariate analysis of variance(M A N O V A )................................................ 963
VIII. Additional Examples Illustrating the Use of the Single-Factor Between-
Subjects Analysis of Variance .................................................................................963
IX. Addendum ................................................................................................................964
1. Test 211: The Single-Factor Between-Subjects Analysis of
Table o f Contents xxvii

C o v a ria n c e .........................................................................................................964
References ............................................................................................................................ 982
Endnotes................................................................................................................................ 985

Test 22: The K ruskal-W allis One-W ay Analysis of Variance by R a n k s ................. 1001
I.Hypothesis Evaluated with Test and Relevant Background Inform ation 1001
II.Example ....................................................................................................................1002
III.Null versus Alternative H ypotheses.....................................................................1003
IV. Test Computations ...................................................................................................1003
V. Interpretation of the Test Results ........................................................................... 1005
VI. Additional Analytical Procedures for the Kruskal-Wallis One-Way
Analysis of Variance by Ranks and/or Related T e s ts ...........................................1005
1. Tie correction for the Kruskal-Wallis one-way analysis of variance by
ra n k s....................................................................................................................1005
2. Pairwise comparisons following computation of the test statistic for
the Kruskal-Wallis one-way analysis of variance by ra n k s..........................1006
VII. Additional Discussion of the Kruskal-Wallis One-Way Analysis of
Variance by Ranks .................................................................................................. 1010
1. Exact tables of the Kruskal-Wallis distribution ...........................................1010
2. Equivalency of the Kruskal-Wallis one-way analysis of
variance by ranks and the Mann - WhitneyU test when k = 2 ....................... 1010
3. Power-efficiency of theKruskal-Wallis one-wayanalysis of
variance by ra n k s ............................................................................................ 1011
4. Alternative nonparametric rank-order procedures for evaluating a
design involving k independent sa m p le s........................................................1011
VIII. Additional Examples Illustrating the Use of the Kruskal-Wallis One-
Way Analysis of Variance by Ranks .....................................................................1012
IX. Addendum ..............................................................................................................1013
1. Test 22a: The Jonckheere - T erpstra test for ordered
a lte rn a tiv e s.......................................................................................................1013
References .......................................................................................................................... 1020
E ndnotes.............................................................................................................................. 1021

Test 23: The van der W aerden Normal- Scores Test for k Independent
Samples ....................................................................................................................1027
I.Hypothesis Evaluated with Test and Relevant Background Inform ation 1027
II.Example ....................................................................................................................1028
III.Null versus Alternative H ypotheses.......................................................................1028
IV. Test Computations ...................................................................................................1029
V. Interpretation of the Test Results ........................................................................... 1031
VI. Additional Analytical Procedures for the van der Waerden Normal-
Scores Test for k Independent Samples and/or Related Tests ............................1032
1. Pairwise comparisons following computation of the test statistic for
the van der Waerden normal-scores test for k independent sam p les 1032
VII. Additional Discussion of the van der Waerden Normal-Scores Test for
k Independent S am p les............................................................................................ 1035
1. Alternative normal-scores tests .....................................................................1035
VIII. Additional Examples Illustrating the Use of the van der Waerden
Normal-Scores Test for k Independent Sam ples................................................... 1035
References .......................................................................................................................... 1036
E ndnotes...............................................................................................................................1037
xxviii Handbook o f Parametric and Nonparametric Statistical Procedures

Inferential Statistical Tests Employed with Two or More


Dependent Samples (and Related Measures of
Association/Correlation)...................................................................... 1041
Test 24: The Single-Factor W ithin-Subjects Analysis of Variance ...........................1043
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1043
II. Example ...................................................................................................................1045
III. Null versus Alternative H ypotheses.......................................................................1045
IV. Test Computations ...................................................................................................1045
V. Interpretation of the Test Results ...........................................................................1050
VI. Additional Analytical Procedures for the Single-Factor Within-Subjects
Analysis of Variance and/or Related T e sts............................................................ 1051
1. Comparisons following computation of the omnibus F value for the
single-factor within-subjects analysis of variance (Test 24a: Multiple
t tests/Fisher’s LSD test; Test 24b: The B onferroni - Dunn test;
Test 24c: Tukey’s HSD test; Test 24d: The Newm an - Keuls test;
Test 24e: The Scheffe test; Test 24f: The Dunnett test; The compu­
tation o f a confidence interval for a comparison; Alternative method­
ology for computing MSres for a com parison)............................................... 1051
2. Comparing the means of three or more conditions when k> 4 ................... 1059
3. Evaluation of the sphericity assumption underlying the single-
factor
within-
subjects analysis of variance .............................................................. 1061
4. Computation of the power of the single-factor within-subjects
analysis of v a ria n c e ..........................................................................................1067
5. Measures of magnitude of treatment effect for the single-
factor
within -
subjects analysis of variance: Omega squared (Test 24g) and
C ohen’s/ index (Test 24h) ...........................................................................1069
6. Computation of a confidence interval for the mean of a treatment
population .........................................................................................................1071
7. Test 24i: The intraclass correlation coefficient.........................................1073
VII. Additional Discussion of the Single-Factor Within-Subjects Analysis of
Variance ....................................................................................................................1075
1. Theoretical rationale underlying the single-factor within-subjects
analysis of v a ria n c e ..........................................................................................1075
2. Definitional equations for the single-factor within-subjects analysis of
variance .........................................................................................................1078
3. Relative power of the single-factor within-subjects analysis of var­
iance and the single-factor between-subjects analysis of variance .............1081
4. Equivalency of the single-factor within-subjects analysis of variance
and the / test for two dependent samples when k = 2 .................................. 1082
5. The Latin square design ................................................................................. 1083
VIII. Additional Examples Illustrating the Use of the Single-Factor Within-
Subjects Analysis of Variance ............................................................................... 1085
References .......................................................................................................................... 1088
Endnotes...............................................................................................................................1089

Test 25: The Friedm an Two - Way Analysis of Variance by R anks ........................... 1095
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1095
II. Example ....................................................................................................................1096
Table o f Contents xxix

III.Null versus Alternative H ypotheses.......................................................................1096


IV. Test Computations .................................................................................................. 1097
V. Interpretation of the Test Results ........................................................................... 1098
VI. Additional Analytical Procedures for the Friedman Two-Way Analysis
Variance by Ranks and/or Related Tests .............................................................. 1099
1. Tie correction for the Friedman two-way analysis variance
by ranks ............................................................................................................. 1099
2. Pairwise comparisons following computation of the test statistic for
the Friedman two-way analysis of variance by ranks ................................ 1100
VII. Additional Discussion of the Friedman Two-Way Analysis of Variance
by R a n k s ....................................................................................................................1105
1. Exact tables of the Friedman distribution ......................................................1105
2. Equivalency of the Friedman two-way analysis of variance by ranks
and the binomial sign test for two dependent samples when k = 2 ...........1105
3. Power-efficiency of the Friedman two-way analysis of variance
by ranks ............................................................................................................. 1106
4. Alternative nonparametric rank-order procedures for evaluating a
design involving k dependent sam ples.......................................................... 1106
5. Relationship between the Friedman two-way analysis of variance by
ranks and Kendall’s coefficient of concordance.........................................1107
VIII. Additional Examples Illustrating the Use of the Friedman Two-Way
Analysis of Variance by R an k s............................................................................. 1107
IX. Addendum ............................................................................................................. 1108
l.T est 25a: The Page Test for O rdered A ltern ativ es.....................................1108
References .......................................................................................................................... 1113
E ndnotes...............................................................................................................................1114

Test 26: The C ochran Q T e s t .............................................................................................. 1119


I.Hypothesis Evaluated with Test and Relevant Background Inform ation 1119
II.Example ....................................................................................................................1120
III.Null versus Alternative H ypotheses.....................................................................1120
IV. Test Computations .................................................................................................1120
V. Interpretation of the Test Results ...........................................................................1122
VI. Additional Analytical Procedures for the Cochran Q Test and/or
Related T e s ts........................................................................................................... 1122
1. Pairwise comparisons following computation of the test statistic for
the Cochran Q t e s t ............................................................................................1122
VII. Additional Discussion of the Cochran Q Test ....................................................1126
1. Issues relating to subjects who obtain the same score under all of the
experimental conditions................................................................................... 1126
2. Equivalency of the Cochran Q test and the McNemar test when
k = 2 ....................................................................................................................1127
3. Alternative nonparametric procedures with categorical data for
evaluating a design involving k dependent sam ples.......................................1129
VIII. Additional Examples Illustrating the Use of the Cochran Q T e s t......................1129
References .......................................................................................................................... 1133
Endnotes...............................................................................................................................1134
XXX Handbook o f Parametric and Nonparametric Statistical Procedures

Inferential Statistical Test Employed with a Factorial Design


(and Related Measures of Association/Correlation)........................ 1137

Test 27: The Between-Subjects Factorial Analysis of V arian ce.................................. 1139


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1139
II.Example ....................................................................................................................1140
III.Null versus Alternative H ypotheses.....................................................................1140
IV. Test Computations ................................................................................................ 1142
V. Interpretation of the Test Results ......................................................................... 1148
VI. Additional Analytical Procedures for the Between-Subjects Factorial
Analysis of Variance and/or Related T e sts ............................................................1155
1. Comparisons following computation of the F values for the between-
subjects factorial analysis of variance (Test 27a: Multiple t tests/
Fisher’s LSD test; Test 27b: The Bonferroni-Dunn test; Test 27c:
Tukey’s HSD test; Test 27d: The Newman-Keuls test; Test 27e:
The Scheffe test; Test 27f: The Dunnett test; Comparisons between
the marginal means; Evaluation of an omnibus hypothesis involving
more than two marginal means; Comparisons between specific groups
that are a combination of both factors; The computation of a confidence
interval for a comparison; Analysis of simple effects) ................................1155
2. Evaluation of the homogeneity of variance assumption o f the between-
subjects factorial analysis of varian ce.......................................................... 1166
3. Computation of the power of the between-subjects factorial analysis of
variance ............................................................................................................. 1167
4. Measures o f magnitude of treatment effect for the between -subjects
factorial analysis of variance: Omega squared (Test27g) and Cohen’s
/in d e x (Test 2 7 h ) ............................................................................................1169
5. Computation of a confidence interval for the mean of a population
represented by a g ro u p ....................................................................................1173
VII. Additional Discussion of the Between-Subjects Factorial Analysis of
Variance ....................................................................................................................1173
1. Theoretical rationale underlying the between-subjects factorial analysis
of v aria n ce .........................................................................................................1173
2. Definitional equations for the between-subjects factorial analysis of
variance ............................................................................................................. 1174
3. Unequal sample siz e s......................................................................................1176
4. The randomized-blocks design .......................................................................1177
5. Additional comments on the between-subjects factorial analysis of
variance (Fixed-effects versus random-effects versus mixed-effects
models; Nested factors/hierarchical designs and designs involving more
than two factors; Screening designs).............................................................. 1181
VIII. Additional Examples Illustrating the Use of the Between-Subjects
Factorial Analysis of V a rian ce............................................................................... 1190
IX. A ddendum ................................................................................................................1191
Discussion of and computational procedures for additional analysis of
variance procedures for factorial designs ......................................................1191
1. Test 27i: The factorial analysis of variance for a mixed
d e sig n .....................................................................................................1191
Analysis of a crossover design with a factorial analysis
of variance for a mixed d e s ig n ............................................... 1196
Table o f Contents xxxi

2. Test 27j: Analysis of variance fora Latin square d e s i g n 1211


3. Test 27k: The within - subjects factorial analysis
of variance .............................................................................................. 1232
4. Analysis of higher-order factorial designs .........................................1237
References .......................................................................................................................... 1238
Endnotes...............................................................................................................................1239

Measures of Association/Correlation ................................................ 1247

Test 28: The Pearson Product - M om ent C orrelation C o e ffic ie n t................................ 1249
I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1249
II. Example ....................................................................................................................1253
III. Null versus Alternative H ypotheses.......................................................................1253
IV. Test Computations .................................................................................................. 1254
V. Interpretation of the Test Results (Test 28a: Test of significance for
a Pearson product - m om ent correlation coefficient; The coefficient
of determination).......................................................................................................1256
VI. Additional Analytical Procedures for the Pearson Product-Moment
Correlation Coefficient and/or Related T e s ts ........................................................1259
1. Derivation of a regression lin e .........................................................................1259
2. The standard error of estim ate.........................................................................1268
3. Computation of a confidence interval for the value of the criterion
variable ............................................................................................................. 1269
4. Computation of a confidence interval for a Pearson product-moment
correlation coefficient ......................................................................................1271
5. Test 28b: Test for evaluating the hypothesis th at the true
population correlation is a specific valueother than zero .................... 1272
6. Computation of power for the Pearson product-moment correlation
coefficient .........................................................................................................1273
7. Test 28c: Test for evaluating a hypothesis on w hether there is a
significant difference between two independent c o rre la tio n s ...............1275
8. T est28d: Test for evaluating a hypothesis on w hether k indepen ­
dent correlations are h o m ogeneous............................................................1277
9. Test 28e: Test for evaluating the null hypothesis H 0: pxz = p YZ .......... 1278
10. Tests for evaluating a hypothesis regarding one or more regression
coefficients (Test 28f: Test for evaluating the null hypothesis H 0: p
= 0; Test 28g: Test for evaluating the null hypothesis H 0: px = P2) . . . 1280
11. Additional correlational procedures................................................................ 1282
VII. Additional Discussion of the Pearson Product-Moment Correlation
Coefficient ................................................................................................................1283
1. The definitional equation for the Pearson product-moment correlation
coefficient .........................................................................................................1283
2. C ovariance.........................................................................................................1284
3. The homoscedasticity assumption of the Pearson product-moment
correlation coefficient ..................................................................................... 1285
4. Residuals, analysis of variance for regression analysis, and
regression diagnostics ......................................................................................1286
5. Autocorrelation (and Test 28h: D urbin - W atson test) ..............................1297
6. The phi coefficient as a special case of the Pearson product-moment
correlation coefficient ......................................................................................1313
xxxii Handbook o f Parametric and Nonparametric Statistical Procedures

7. Ecological correlation ......................................................................................1314


8. Cross-lagged panel and regression-discontinuity d e sig n s ............................1316
VIII. Additional Examples Illustrating the Use of the Pearson Product-
Moment Correlation Coefficient............................................................................. 1323
IX. Addendum ......................................................................................................... 1324
1. Bivariate measures of correlation that are related to the Pearson product-
moment correlation coefficient (Test 28i: The point - biserial cor ­
relation coefficient (and Test 28i-a: Test of significance for a point-
biserial correlation coefficient); Test 28j: The biserial correlation
coefficient (and Test 28j-a: Test of significance for a biserial corre ­
lation coefficient); Test 28k: The tetrachoric correlation coefficient
(and Test 28k-a: Test of significance for a tetrachoric cor-relation
coefficient)).......................................................................................................1324
2. Data m in in g .......................................................................................................1334
3. Time series an aly sis..........................................................................................1337
References .......................................................................................................................... 1353
Endnotes.............................................................................................................................. 1358

Test 29: S pearm an’s R ank - O rder C orrelation C oefficient...........................................1365


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1365
II.Example ....................................................................................................................1367
III.
Null versus Alternative H ypotheses.......................................................................1367
IV.Test Computations .................................................................................................. 1368
V.Interpretation of the Test Results (Test 29a: Test of significance for
S pearm an’s rank - order correlation coefficient) .............................................1369
VI. Additional Analytical Procedures for Spearman’s Rank-Order
Correlation Coefficient and/or Related T e s ts ........................................................1371
1. Tie correction for Spearman’s rank-order correlation coefficient 1371
2. Spearman’s rank-order correlation coefficient as a special case of the
Pearson product-moment correlation coefficient...........................................1373
3. Regression analysis and Spearman’s rank-order correlation
coefficient .........................................................................................................1374
4. Partial rank correlation......................................................................................1375
5. Use of Fisher’s zr transformation with Spearman’s rank -order
correlation coefficient ......................................................................................1376
VII. Additional Discussion of Spearman’s Rank-Order Correlation
Coefficient ............................................................................................................... 1376
1. The relationship between Spearman’s rank-order correlation coefficient,
Kendall’s coefficient of concordance, and the Friedman two-way
analysis of variance by ra n k s ........................................................................... 1376
2. Power efficiency of Spearman’s rank-order correlation co e fficien t 1379
3. Brief discussion of Kendall’s tau: An alternative measure of association
for two sets of ran k s.......................................................................................... 1379
4. Weighted rank/top-down correlation.............................................................. 1379
VIII. Additional Examples Illustrating the Use of the Spearman’s Rank -Order
Correlation C oefficient............................................................................................ 1380
References .......................................................................................................................... 1380
E ndnotes...............................................................................................................................1381
Table o f Contents xxxiii

Test 30: K endall’s Tau ........................................................................................................1383


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1383
II. Example ...................................................................................................................1385
III. Null versus Alternative H ypotheses..................................................................... 1385
IV. Test Computations ................................................................................................. 1386
V. Interpretation o f the Test Results (Test 30a: Test of significance for
K endall’s tau) .........................................................................................................1388
VI. Additional Analytical Procedures for Kendall’s Tau and/or
Related T e sts............................................................................................................. 1391
1. Tie correction for Kendall’s t a u .......................................................................1391
2. Regression analysis and Kendall’s t a u ............................................................1394
3. Partial rank correlation......................................................................................1394
4. Sources for computing a confidence interval for Kendall’s ta u ................... 1394
VII. Additional Discussion of Kendall’s T a u ............................................................... 1394
1. Power efficiency of Kendall’s t a u .................................................................. 1394
2. Kendall’s coefficient of agreement ................................................................ 1394
VIII. Additional Examples Illustrating the Use of Kendall’s T a u ............................... 1394
References .......................................................................................................................... 1395
E ndnotes...............................................................................................................................1396

Test 31: K endall’s Coefficient of Concordance .............................................................. 1399


I.Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1399
II.Example ....................................................................................................................1400
III.Null versus Alternative H ypotheses.......................................................................1400
IV. Test Computations .................................................................................................. 1401
V. Interpretation of the Test Results (Test 31a: Test of significance for
K endall’s coefficient of co n co rd an ce)................................................................ 1402
VI. Additional Analytical Procedures for Kendall’s Coefficient of
Concordance and/or Related T e s ts .........................................................................1403
1. Tie correction for Kendall’s coefficient of concordance..............................1403
VII. Additional Discussion of Kendall’s Coefficient of Concordance ..................... 1405
1. Relationship between Kendall’s coefficient of concordance and
Spearman’s rank -order correlation coefficient .............................................1405
2. Relationship between Kendall’s coefficient of concordance and the
Friedman two-way analysis of variance by ranks .........................................1406
3. Weighted rank/top-down concordance ..........................................................1408
4. Kendall’s coefficient of concordance versus the intraclass correlation
coefficient ......................................................................................................... 1408
VIII. Additional Examples Illustrating the Use of Kendall’s Coefficient of
C oncordance............................................................................................................. 1410
References .......................................................................................................................... 1411
E ndnotes...............................................................................................................................1412

Test 32: Goodm an and K ruskal’s G a m m a .......................................................................1415


I. Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1415
II. Example ....................................................................................................................1416
III. Null versus Alternative H ypotheses.......................................................................1417
IV. Test Computations ...................................................................................................1418
V. Interpretation of the Test Results (Test 32a: Test of significance for
xxxiv Handbook o f Parametric and Nonparametric Statistical Procedures

Goodm an and K ruskal’s g a m m a ).......................................................................1421


VI. Additional Analytical Procedures for Goodman and Kruskal’s Gamma
and/or Related T e s t s ................................................................................................ 1422
1. The computation of a confidence interval for the value of Goodman and
Kruskal’s gam m a.............................................................................................. 1422
2. Test 32b: Test for evaluating the null hypothesis H 0: y 1= y z ...............1423
3. Sources for computing a partial correlation coefficient for Goodman and
Kruskal’s gam m a.............................................................................................. 1424
VII. Additional Discussion of Goodman and Kruskal’s G am m a................................ 1424
1. Relationship between Goodman and Kruskal’s gamma and Yule’s Q . . . 1424
2. Somers’ delta as an alternative measure of association for an ordered
contingency ta b le .............................................................................................. 1424
VIII. Additional Examples Illustrating the Use of Goodman and Kruskal’s
Gamma ......................................................................................................................1425
R eferences............................................................................................................................ 1426
E ndnotes...............................................................................................................................1427

Multivariate Statistical Analysis ........................................................ 1429

M atrix Algebra andM ultivariate A n a ly sis.......................................................................... 1431


I. Introductory Comments on Multivariate Statistical A nalysis................................ 1431
II. Introduction to Matrix Algebra ............................................................................... 1432
References .......................................................................................................................... 1444
E ndnotes.............................................................................................................................. 1444

Test 33: M ultiple Regression .............................................................................................. 1445


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1445
II. E xam ples....................................................................................................................1453
III. Null versus Alternative H ypotheses.......................................................................1453
IV/V. Test Computations and Interpretation of the Test Results
A. Test computations and interpretation of results for Example 33.1
(Computation of the multiple correlation coefficient; The coefficient of
multiple determination; Test of significance for a multiple correlation
coefficient; The multiple regression equation; The standard error of
multiple estimate; Computation of a confidence interval for Y'; Evalua­
tion o f the relative importance of the predictor variables; Evaluating
the significance of a regression coefficient; Computation of a confidence
interval for a regression coefficient; Analysis of variance for multiple
regression; Semipartial and partial correlation (Test 33a: Computation of
a semipartial correlation coefficient; Test of significance for a semipartial
correlation coefficient; Test 33b: Computation of a partial correlation
coefficient; Test of significance for a partial correlation coefficient) .............1454
B. Test computations and interpretation of results for Example
33.2 with SPSS ................................................................................................ 1472
VI. Additional Analytical Procedures for Multiple Regression
and/or Related T e s t s ................................................................................................ 1482
1. Cross-validation of sample d a ta ..................................................................... 1482
VII. Additional Discussion of Multiple Regression....................................................1483
1. Final comments on multiple regression an a ly sis......................................... 1483
Table o f Contents xxxv

2. Causal modeling: Path analysis and structural


equation m o d elin g .......................................................................................... 1484
3. Brief note on logistic regression ...................................................................1484
VIII. Additional Examples Illustrating the Use of Multiple R egression................. 1485
References ........................................................................................................................ 1485
Endnotes.............................................................................................................................1487

Test 34: Hotelling’s T 2 ......................................................................................................... 1495


I.Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1495
II.Example ..................................................................................................................1495
III.Null versus Alternative H ypotheses.....................................................................1496
IV. Test Computations .................................................................................................1497
V. Interpretation of the Test Results ......................................................................... 1498
VI. Additional Analytical Procedures for Hotelling’s T 2and/or
Related T e s ts........................................................................................................... 1500
1. Additional analyses following the test of the omnibus null
hypothesis ....................................................................................................... 1500
2. Test 34a: The single -sample Hotelling’s T 2 ........................................... 1501
3. Test 34b: The use of the single -sample Hotelling’s T 2 to
evaluate a dependent samples design ......................................................1503
VII. Additional Discussion of Hotelling’s T 2 ............................................................ 1507
1. Hotelling’s T 2 and Mahalanobis’ D 2 statistic............................................. 1507
VIII. Additional Examples Illustrating the Use of Hotelling’s T 2 .............................. 1507
References ........................................................................................................................ 1507
E ndnotes.............................................................................................................................1508

Test 35: M ultivariate Analysis of Variance ...................................................................1511


I.
Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1511
Example ..................................................................................................................1512
II.
III.
Null versus Alternative H ypotheses.....................................................................1513
IV.Test Computations .................................................................................................1514
V.Interpretation of the Test Results ......................................................................... 1515
VI.Additional Analytical Procedures for the Multivariate Analysis of Variance
and/or Related T e s t s ...............................................................................................1522
VII. Additional Discussion of the Multivariate Analysis of V ariance......................1522
1. Conceptualizing the hypothesis for the multivariate analysis of
variance within the context of a linear combination ...................................1522
2. Multicollinearity and the multivariate analysis of variance........................ 1522
VIII. Additional Examples Illustrating the Use of the Multivariate
Analysis of V ariance...............................................................................................1523
References ........................................................................................................................ 1523
Endnotes.............................................................................................................................1524

Test 36: M ultivariate Analysis of Covariance ...............................................................1527


I. Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1527
II. Example ..................................................................................................................1528
III. Null versus Alternative H ypotheses..................................................................... 1529
IV. Test Computations .................................................................................................1529
V. Interpretation of the Test Results ......................................................................... 1530
xxxvi Handbook o f Parametric and Nonparametric Statistical Procedures

VI. Additional Analytical Procedures for the Multivariate Analysis of


Covariance and/or Related T ests............................................................................. 1533
VII. Additional Discussion of the Multivariate Analysis of C ovariance.................. 1533
1. Multiple covariates ..........................................................................................1533
VIII. Additional Examples Illustrating the Use of the Multivariate
Analysis of C ovariance............................................................................................ 1534
References .......................................................................................................................... 1534
E ndnotes...............................................................................................................................1534

Test 37: Discriminant Function Analysis .........................................................................1537


I.
Hypothesis Evaluated with Test andRelevant Background Inform ation 1537
II.
Exam ples....................................................................................................................1539
III.
Null versus Alternative H ypotheses.......................................................................1540
Test Computations .................................................................................................1541
IV.
V.
Interpretation of the Test Results ...........................................................................1543
Analysis of Example 37.1 ............................................................................. 1543
Analysis of Example 37.2 ............................................................................. 1555
VI. Additional Analytical Procedures for Discriminant Function Analysis
and/or Related T e s t s ................................................................................................ 1564
VII. Additional Discussion of Discriminant Function A nalysis.................................. 1564
VIII. Additional Examples Illustrating the Use of Discriminant
Function A n aly sis.....................................................................................................1564
References .......................................................................................................................... 1564
E ndnotes.............................................................................................................................. 1565

Test 38: Canonical Correlation .........................................................................................1569


I. Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1569
II.Example ....................................................................................................................1572
III. Null versus AlternativeH ypotheses.................................................................... 1572
IV. Test Computations ................................................................................................. 1572
V. Interpretation of the Test Results ...........................................................................1573
VI. Additional Analytical Procedures for Canonical Correlation
and/or Related T ests................................................................................................ 1586
VII. Additional Discussion of Canonical Correlation ................................................1586
VIII. Additional Examples Illustrating the Use of Canonical C orrelation ................1587
References .......................................................................................................................... 1587
Endnotes...............................................................................................................................1588

Test 39: Logistic R egression................................................................................................ 1593


I.Hypothesis Evaluated with Test andRelevant Background Inform ation 1593
II.Example ....................................................................................................................1596
III. Null versus AlternativeH ypotheses.....................................................................1596
IV. Test Computations ................................................................................................. 1599
V. Interpretation of the Test Results ...........................................................................1602
Results for a binary logistic regression withone predictorv a ria b le 1602
Results for a binary logistic regression with multiplepredictor variables .1610
VI. Additional Analytical Procedures for Logistic Regression
and/or Related T ests................................................................................................ 1617
VII. Additional Discussion of Logistic R egression..................................................... 1618
Table o f Contents xxxvii

VIII. Additional Examples Illustrating the Use of Logistic Regression ................... 1618
References .......................................................................................................................... 1618
E ndnotes...............................................................................................................................1619

Test 40: Principal Components Analysis and Factor A n alysis.................................... 1627


I.
Hypothesis Evaluated with Test and Relevant Background Inform ation 1627
Example ....................................................................................................................1630
II.
III.
Null versus Alternative H ypotheses.......................................................................1630
IV.Test Computations .................................................................................................. 1630
V.Interpretation of the Test Results ...........................................................................1634
VI.Additional Analytical Procedures for Principal Components Analysis
and Factor Analysis and/or Related T e s t s ............................................................ 1646
1. Principal axis factor analysis of Example 40.1 ........................................... 1646
VII. Additional Discussion of Principal Components Analysis and
Factor A n aly sis.........................................................................................................1647
1. Criticisms of factor analytic procedures ........................................................1647
2. Cluster analysis ................................................................................................ 1647
3. Multidimensional scaling................................................................................. 1648
VIII. Additional Examples Illustrating the Use of Principal Components
Analysis and Factor A n a ly sis................................................................................. 1649
References .......................................................................................................................... 1651
E ndnotes...............................................................................................................................1652

Test 41: Path A n alysis........................................................................................................... 1659


I. Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1659
Additional discussion of basic terminology employed
in path analysis.................................................................................................. 1661
Assumptions underlying path analysis............................................................1665
II. Example ....................................................................................................................1666
III. Null versus Alternative H ypotheses.......................................................................1666
IV. Test Computations .................................................................................................. 1668
Decomposition of correlations among pairs of variables..............................1669
Model identification..........................................................................................1670
Computation of degrees of freedom for a path m o d e l.................................. 1671
Determination of the number of observations............................................... 1671
Determination of the number of parameters to be estim ated....................... 1672
Guidelines for evaluating effect v a lu e s ..........................................................1673
Decomposition of correlations among pairs of variables in
Models A and B ..........................................................................................1674
V. Interpretation o f the Test Results ......................................................................... 1678
Goodness-of-fit indices for a path analysis m odel.........................................1680
VI. Additional Analytical Procedures for Path Analysis ..........................................1682
VII. Additional Discussion of Path A n aly sis............................................................... 1682
VIII. Additional Examples Illustrating the Use of Path analysis.................................. 1683
References .......................................................................................................................... 1683
E ndnotes...............................................................................................................................1684

Test 42: Structural Equation M odeling............................................................................ 1687


I. Hypothesis Evaluated with Test and Relevant BackgroundInform ation 1687
xxxviii Handbook o f Parametric and Nonparametric Statistical Procedures

Assumptions underlying S E M .........................................................................1689


Elements employed in a structural equation m o d e l.......................................1690
Methods for summarizing a structural model ............................................... 1691
II. Example ....................................................................................................................1699
III. Null versus Alternative H ypotheses.....................................................................1700
IV. Test Computations .................................................................................................. 1700
V. Interpretation of the Test Results ......................................................................... 1701
Guidelines for determining degrees of freedom ...........................................1701
Assessing model fit ..........................................................................................1703
Analysis of Example 42.1 ............................................................................. 1708
VI. Additional Analytical Procedures for Structural Equation M odeling............... 1713
VII. Additional Discussion of Structural Equation M o d elin g .................................... 1713
SEM software .................................................................................................. 1713
Additional sources of information on S E M ................................................... 1713
VIII. Additional Examples Illustrating the Use of Structural
Equation M o d elin g ................................................................................................ 1714
References .......................................................................................................................... 1718
Endnotes...............................................................................................................................1720

Test 43: M eta - A n aly sis.........................................................................................................1725


I. Hypothesis Evaluated with Test and Relevant Background Inform ation 1725
Relevant background information on meta-analysis .................................... 1725
Measures of effect size ................................................................................... 1727
II. E xam ples........................................................................... 1733
III. Null versus Alternative H ypotheses.....................................................................1735
IV/V. Test Computations and Interpretation of Test Results .........................................1735
Meta-analytic procedures employing significance level and
effect s iz e ................................................................................................ 1735
Test 43a: Procedure for comparing k studies with respect to
significance le v e l ................................................................................. 1737
Test 43b: The Stouffer procedure for obtaining a combined
significance level (p value) for k s tu d ie s ...........................................1738
The file drawer problem ...........................................................................1740
Test 43c: Homogeneity analysis for com paring k studies with
respect to effect s i z e ...........................................................................1743
Test 43d: Procedure for obtaining a combined effect size
for& studies ..........................................................................................1745
Meta-analytic procedures based on weighting effect sizes with
inverse variance weights .......................................................................1746
Test 43e: Procedure for obtaining a weighted mean effect
size for k studies ................................................................................. 1753
Test 43f: Procedure for evaluating the null hypothesis th at the
true value of the overall effect size in the underlying
population equals 0 ............................................................................. 1753
Test 43g: Procedure for computing a confidence interval
for the mean effect s iz e .......................................................................1753
Test 43h: Homogeneity analysis for com paring k studies
with respect to effect size through use of inverse
variance w e ig h ts....................................................................................1754
VI. Additional Analytical Procedures for M eta - Analysis......................................... 1768
Table o f Contents xxxix

Graphing techniques for meta-analysis ........................................................1768


Alternative meta-analytic procedures ............................................................1769
Practical implications of magnitude of effect size v a lu e ..............................1771
Test 43i: Binomial effect size d isp lay ..........................................................1773
VII. Additional Discussion of Meta-Analysis .............................................................. 1775
Meta-analysis software ..................................................................................1775
The significance test controversy.................................................................... 1775
The minimum-effect hypothesis testing model .............................................1777
VIII. Additional Examples Illustrating the Use of Meta-Analysis ..............................1778
References .......................................................................................................................... 1778
Endnotes...............................................................................................................................1781

Appendix: Tables ................................................................................ 1791

Table A l. Table of the Normal Distribution................................................................ 1795


Table A2. Table of Student’s t Distribution ................................................................ 1800
Table A3. Power Curves for Student’s t D istribution............................................... 1801
Table A4. Table of the Chi-Square Distribution ........................................................1805
Table A5. Table of Critical T Values for Wilcoxon’s Signed-Ranks and
Matched-Pairs Signed-Ranks T e sts ............................................................ 1806
Table A6. Table of the Binomial Distribution, Individual P rob ab ilities 1807
Table A7. Table of the Binomial Distribution, Cumulative Probabilities 1810
Table A8. Table of Critical Values for the Single-Sample Runs Test ................... 1813
Table A9. Table of the FmaxDistribution..................................................................... 1814
Table A10. Table of the F D istribution..........................................................................1815
Table A ll. Table of Critical Values for Mann-Whitney U S ta tistic ....................... 1823
Table A12. Table of Sandler’s A S tatistic.......................................................................1825
Table A13. Table of the Studentized Range Statistic....................................................1826
Table A14. Table of Dunnett’s Modified t Statistic for a Control Group
Comparison .....................................................................................................1828
Table A15. Graphs of the Power Function for the Analysis of Variance................. 1830
Table A16. Table of Critical Values for Pearson r ...................................................... 1834
Table A17. Table of Fisher’s zr Transformation ........................................................ 1835
Table A18. Table of Critical Values for Spearman’s R h o ......................................... 1836
Table A19. Table of Critical Values for Kendall’s T a u ..............................................1837
Table A20. Table of Critical Values for Kendall’s Coefficient of
Concordance.....................................................................................................1838
Table A21. Table of Critical Values for the Kolmogorov-Smirnov
Goodness-of-Fit Test for a Single S am p le................................................. 1839
Table A22. Table of Critical Values for the Lilliefors Test for Normality ........... 1840
Table A23. Table of Critical Values for the Kolmogorov-Smirnov Test for
Two Independent S a m p le s........................................................................... 1841
Table A24. Table of Critical Values for the Jonckheere-Terpstra
Test S tatistic.....................................................................................................1843
Table A25. Table of Critical Values for the Page Test S ta tistic ................................ 1845
Table A26. Table of Extreme Studentized Deviate Outlier Statistic ....................... 1847
Table A27. Table of Durbin-Watson Test S tatistic......................................................1848
Table A28. Constants Used for Estimation and Construction of Control
C h a r ts ................................................................................................................1850

In d e x ........................................................................................................ 1851
Introduction

The intent of this Introduction is to provide the reader with a general overview of basic
terminology, concepts, and methods employed within the areas o f descriptive statistics and
experimental design. To be more specific, the following topics will be covered: a) Computational
procedures for measures of central tendency, variability, skewness, and kurtosis; b) Visual
methods for displaying data; c) The normal distribution; d) Hypothesis testing; e) Experimental
design; and f) Basic principles of probability. Within the context of the latter discussions, the
reader is presented with the necessary information for both understanding and using the statistical
procedures which are described in this book. Following the Introduction is an outline of all the
procedures that are covered, as well as decision tables to aid the reader in selecting the
appropriate statistical procedure.

Descriptive versus Inferential Statistics

The term statistics is derived from Latin and Italian terms which respectively mean “status” and
“state arithmetic” (i.e., the present conditions within a state or nation). In a more formal sense,
statistics is a field within mathematics that involves the summary and analysis of data. The field
ofstatistics can be divided into two general areas, descriptive statistics and inferential statistics.
Descriptive statistics is a branch of statistics in which data are only used for descriptive
purposes and are not employed to make predictions. Thus, descriptive statistics consists of
methods and procedures for presenting and summarizing data. The procedures most commonly
employed in descriptive statistics are the use of tables and graphs, and the computation of
measures of central tendency and variability. Measures of association or correlation, which are
covered in this book, are also categorized by most sources as descriptive statistical procedures,
insofar as they serve to describe the relationship between two or more variables. A variable is
any property of an object or an organism with respect to which there is variation - i.e., not every
object or organism is the same with respect to that property. Examples o f variables are color,
weight, height, gender, intelligence, etc.
Inferential statistics employs data in order to draw inferences (i.e., derive conclusions) or
make predictions. Typically, in inferential statistics sample data are employed to draw inferences
about one or more populations from which the samples have been derived. Whereas a population
consists of the sum total o f subjects or objects that share something in common with one another,
a sample is a set o f subjects or objects which have been derived from a population. For a sample
to be useful in drawing inferences about the larger population from which it was drawn, it must
be representative o f the population. Thus, typically (although there are exceptions), the ideal
sample to employ in research is a random sample. A random sample must adhere to the
following criteria: a) Each subject or object in the population has an equal likelihood of being
selected as a member of the sample; b) The selection of each subject/object is independent of the
selection of all other subjects/objects in the population; and c) For a specified sample size, every
possible sample that can be derived from the population has an equal likelihood of occurring.
In point o f fact, it would be highly unusual to find an experiment that employed a truly
random sample. Pragmatic and/or ethical factors make it literally impossible in most instances to
obtain random samples for research. Insofar as a sample is not random, it will limit the degree
to which a researcher will be able to generalize one's results. Put simply, one can only generalize
2 Handbook o f Parametric and Nonparametric Statistical Procedures

to objects or subjects that are similar to the sample employed. (A more detailed discussion of the
general subject of sampling is provided later in this chapter.)

Statistic versus Parameter

A statistic refers to a characteristic of a sample, such as the average score (also known as the
mean). A parameter, on the other hand, refers to a characteristic of a population (such as the
average of a whole population). A statistic can be employed for either descriptive or inferential
purposes. An example of using a statistic for descriptive purposes is obtaining the mean of a
group (which represents a sample) in order to summarize the average performance of the group.
On the other hand, if we use the mean of a group to estimate the mean of a larger population the
group is supposed to represent, the statistic (i.e., the group mean) is being employed for inferential
purposes. The most basic statistics that are employed for both descriptive and inferential purposes
are measures of central tendency (of which the mean is an example) and measures of
variability.
In inferential statistics the computed value of a statistic (e.g., a sample mean) is employed
to make inferences about a parameter in the population from which the sample was derived (e.g.,
the population mean). The inferential statistical procedures described in this book all employ data
derived from one or more samples in order to draw inferences or make predictions with respect
to the larger population(s) from which the sample(s) was/were drawn.
Sampling error is the discrepancy between the value of a statistic and the parameter it
estimates. Due to sampling error, the value of a statistic will usually not be identical to the
parameter it is employed to estimate. The larger the sample size the less the influence of sampling
error, and consequently the closer one can expect the value of a statistic to be to the actual value
of a parameter.
When data from a sample are employed to estimate a population parameter, any statistic
derived from the sample should be unbiased. Although sampling error will be associated with an
unbiased statistic, an unbiased statistic provides the most accurate estimate of a population
parameter. A biased statistic, on the other hand, does not provide as accurate an estimate of that
parameter as an unbiased statistic, and consequently a biased statistic will be associated with a
greater sampling error. Stated in a more formal way, an unbiased statistic (also referred to as an
unbiased estimator) is one whose expected value is equal to the parameter it is employed to
estimate. The expected value of a statistic is based on the premise that an infinite number of
samples of equal size are derived from the relevant population, and for each sample the value of
the statistic is computed. The average of all the values computed for the statistic will represent
the expected value of that statistic. The latter distribution of average values for the statistic is
more formally referred to as a sampling distribution (which is a concept discussed in greater
depth later in the book). The subject of bias in statistics will be discussed later in reference to the
mean (which is the most commonly employed measure of central tendency) and the variance
(which is the most commonly employed measure of variability).

Levels of Measurement

Typically, information which is quantified in research for purposes of analysis is categorized with
respect to the level of measurement the data represent. Different levels of measurement contain
different amounts of information with respect to whatever the data are measuring. A data
classification system developed by Stevens (1946), which is commonly employed within the
framework of many scientific disciplines, will be presented in this section.
Introduction 3

Statisticians generally conceptualize data as fitting within one of the following four
measurement categories: nominal data (also known as categorical data), ordinal data (also
known as rank-order data), interval data, and ratio data. As one moves from the lowest level
ofmeasurement, nominal data, to the highest level, ratio data, the amount of information provided
by the numbers increases, as well as the meaningful mathematical operations that can be per­
formed on those numbers. Each of the levels of measurement will now be discussed in more
detail.
a) Nominal/categorical level measurement In nominal/categorical measurement numbers
are employed merely to identify mutually exclusive categories, but cannot be manipulated in a
mathematically meaningful manner. As an example, a person's social security number represents
nominal measurement since it is used purely for purposes of identification and cannot be mean­
ingfully manipulated in a mathematical sense (i.e., adding, subtracting, etc. the social security
numbers of people does not yield anything of tangible value).
b) Ordinal/rank-order level measurement In an ordinal scale, the numbers represent
rank-orders, and do not give any information regarding the differences between adjacent ranks.
For example, the order of finish in a horse race represents an ordinal scale. If in a race Horse A
beats Horse B in a photo finish, and Horse B beats Horse C by twenty lengths, the respective
order of finish of the three horses reveals nothing about the fact that the distance between the first
and second place horses was minimal, while the difference between the second and third place
horses was substantial.
c) Interval level measurement An interval scale not only considers the relative order of
the measures involved (as is the case with an ordinal scale) but, in addition, is characterized by
the fact that throughout the length of the scale equal differences between measurements cor­
respond to equal differences in the amount of the attribute being measured. What this translates
into is that if IQ is conceptualized as an interval scale, the one point difference between a person
who has an IQ of 100 and someone who has an IQ of 101 should be equivalent to the one point
difference between a person who has an IQ of 140 and someone with an IQ of 141. In actuality
some psychologists might argue this point, suggesting that a greater increase in intelligence is
required to jump from an IQ of 140 to 141 than to jump from an IQ of 100 to 101. If, in fact, the
latter is true, a one point difference does not reflect the same magnitude of difference across the
full range of the IQ scale. Although in practice IQ and most other human characteristics measured
by psychological tests (such as anxiety, introversion, self esteem, etc.) are treated as interval
scales, many researchers would argue that they are more appropriately categorized as ordinal
scales. Such an argument would be based on the fact that such measures do not really meet the
requirements of an interval scale, because it cannot be demonstrated that equal numerical dif­
ferences at different points on the scale are comparable.
It should also be noted that unlike ratio scales, which will be discussed next, interval scales
do not have a true zero point. If interval scales have a zero score that can be assigned to a person
or object, it is assumed to be arbitrary. Thus, in the case of IQ we can ask the question of whether
or not there is truly an IQ which is so low that it literally represents zero IQ. In reality, you
probably can only say a person who is dead has a zero IQ! In point of fact, someone who has
obtained an IQ of zero on an IQ test has been assigned that score because his performance on the
test was extremely poor. The zero IQ designation does not necessarily mean the person could not
answer any of the test questions (or, to go further, that the individual possesses none of the
requisite skills or knowledge for intelligence). The developers of the test just decided to select a
certain minimum score on the test and designate it as the zero IQ point.
d) Ratio level measurement As is the case with interval level measurement, ratio level
measurement is also characterized by the fact that throughout the length of the scale, equal
differences between measurements correspond to equal differences in the amount of the attribute
being measured. However, ratio level measurement is also characterized by the fact that it has
4 Handbook o f Parametric and Nonparametric Statistical Procedures

a true zero point. Because of the latter, with ratio measurement one is able to make meaningful
ratio statements with regard to the attribute/variable being measured. To illustrate these points,
most physical measures such as weight, height, blood glucose level, as well as measures of certain
behaviors such as the number of times a person coughs or the number of times a child cries,
represent ratio scales. For all of the aforementioned measures there is a true zero point (i.e., zero
weight, zero height, zero blood glucose, zero coughs, zero episodes of crying), and for each of
these measures one is able to make meaningful ratio statements (such as Ann weighs twice as
much as Joan, Bill is one -half the height of Steve, Phil's blood glucose is 10 times Sam's, Mary
coughs five times as often as Pete, and Billy cries three times as much as Heather).

Continuous versus Discrete Variables

When measures are obtained on people or objects, in most instances we assume there will be
variability. Since we assume variability, not everyone or everything will have the same score on
whatever it is that is being measured. For this reason, when something is measured it is commonly
referred to as a variable. As noted above, variables can be categorized with respect to the level
of measurement they represent. In contrast to a variable, a constant is a number which never
exhibits variation. Examples of constants are the mathematical constants pi and e (which are
respectively 3.14159... and 2.71828...), the number of days in a week (which will always be 7),
the number of days in the month of April (which will always be 30), etc.
A variable can be categorized with respect to whether it is continuous or discrete. A
continuous variable can assume any value within the range of scores that define the limits of that
variable. A discrete variable, on the other hand, can only assume a limited number of values.
To illustrate, temperature (which can assume both integer and fractional/decimal values within
a given range) is a continuous variable. Theoretically, there are an infinite number of possible
temperature values, and the number of temperature values we can measure is limited only by the
precision of the instrument we are employing to obtain the measurements. On the other hand, the
face value of a die is a discrete variable, since it can only assume the integer values 1 through
6.

Measures of Central Tendency

Earlier in this chapter it was noted that the most commonly employed statistics are measures of
central tendency and measures of variability. This section will describe five measures of central
tendency: the mode, the median, the mean, the geometric mean, and the harmonic mean.
The mode The mode is the most frequently occurring score in a distribution of scores. A
mode that is derived for a sample is a statistic, whereas the mode of a population is a parameter.
In the following distribution of scores the mode is 5, since it occurs two times, whereas all other
scores occur only once: 0, 1,2, 5, 5, 8, 10. If more than one score occurs with the highest
frequency, it is possible to have two or more modes in a distribution. Thus, in the distribution 0,
1, 2, 5, 6, 8, 10, all of the scores represent the mode, since each score occurs one time. A dis­
tribution with more than one mode is referred to as a multimodal distribution. If it happens that
two scores both occur with the highest frequency, the distribution would be described as a
bimodal distribution, which represents one type of multimodal distribution. The distribution 0,
5, 5, 8, 9, 9, 12 is bimodal, since the scores 5 and 9 both occur two times and all other scores
appear once.
The most common situation in which the mode is employed as a descriptive measure is
within the context of a frequency distribution. A frequency distribution is a table which
summarizes a set of data in a tabular format, listing the frequency of each score adjacent to that
Introduction 5

score. Table 1.1 is a frequency distribution for Distribution A noted below, which is comprised
of n = 20 scores. (A more detailed discussion of Distribution A can be found later in this chapter
in the discussion of visual methods for displaying data.) It should be noted that Column 1 of
Table 1.1 (i.e., the column at the left with the notation X at the top) only lists those scores in
Distribution A which fall within the range of values 22 - 96 that have a frequency of occurrence
greater than zero. Although all of the scores within the range of values 22 - 96 could have been
listed in Column 1 (i.e., including all of the scores with a frequency of zero), the latter would
increase the size of the table substantially, and in the process make it more difficult to interpret.
Consequently, it is more efficient to just list those scores which occur at least once, since it allows
for a succinct summary of the data — the latter being a major reason why a frequency distribution
is employed.

Distribution A: 22, 55, 60, 61, 61, 62, 62, 63, 63, 67, 71, 71, 72, 72, 72, 74, 74, 76, 82, 96

Table 1.1 Frequency Distribution


of Distribution A

X /

96 1
82 1
76 1
74 2
72 3
71 2
67 1
63 2
62 2
61 2
60 1
55 1
22 1

« = 20

In addition to presenting data in a tabular format, a researcher can also summarize data
within the format of a graph. Indeed, it is recommended that researchers obtain a plot of their data
prior to conducting any sort of formal statistical analysis. The reason for this is that a body of
data can have certain characteristics which may be important in determining the most appropriate
method of analysis. Often such characteristics will not be apparent to a researcher purely on the
basis of cursory visual inspection — especially if there is a large amount o f data and/or one is
relatively inexperienced in dealing with data. A commonly employed method for visually
presenting data is to construct a frequency polygon, which is a graph of a frequency distribution.
Figure 1.1 is a frequency polygon of Distribution A.
6 Handbook o f Parametric and Nonparametric Statistical Procedures

Figure 1.1 Frequency Polygon of Distribution A

Note that a frequency polygon is comprised of two axes, a horizontal axis and a vertical
axis. The X-axis or horizontal axis (which is referred to as the abscissa) is employed to record
the range of possible scores on a variable. (The element — I /— on the left side of the X-axis of
Figure 1.1 is employed when a researcher only wants to begin recording scores on the abscissa
which fall at some point above 0, and not list any scores in between 0 and that point.) The T-axis
or vertical axis (which is referred to as the ordinate) is employed to represent the frequency (f)
with which each of the scores noted on the X-axis occurs in the sample or population. In order
to provide some degree of standardization in graphing data, many sources recommend that the
length of the T-axis be approximately three-quarters the length of the X-axis.
Inspection o f Figure 1.1 reveals that a frequency polygon is a series of lines which connect
a set of points. One point is employed for each of the scores that comprise the range of scores in
the distribution. The point which represents any score in the distribution that occurs one or more
times will fall directly above that score at a height corresponding to the frequency for that score
recorded on the T-axis. When the frequency polygon descends to and/or moves along the X-axis,
it indicates a frequency of zero for those scores on the X-axis. The highest point on a frequency
polygon will always fall directly above the score which corresponds to the mode of the
distribution (which in the case of Distribution A is 72). (In the case of a multimodal distribution
the frequency polygon will have multiple high points.) A more detailed discussion of the use of
tables and graphs for descriptive purposes as well as a discussion of exploratory data analysis
(which is an alternative methodology for scrutinizing data) will be presented later in this chapter
in the section on the visual display of data.
The median The median is the middle score in a distribution. If there is an odd number
of scores in a distribution, in order to determine the median the following protocol should be
employed: Divide the total number of scores by 2 and add .5 to the result of the division. The
obtained value indicates the ordinal position of the score which represents the median of the
Introduction 7

distribution (note, however, that this value does not represent the median). Thus, if we have a
distribution consisting of five scores (e.g., 6 ,8 ,9 ,1 3 ,1 6 ), we divide the number of scores in the
distribution by two, and add .5 to the result of the division. Thus, (5/2) + .5 = 3. The obtained
value of 3 indicates that if the five scores are arranged ordinally (i.e., from lowest to highest), the
median is the 3rd highest (or 3rd lowest) score in the distribution. With respect to the distribution
6, 8, 9, 13, 16, the value of the median will equal 9, since 9 is the score in the third ordinal
position.
If there is an even number of scores in a distribution, there will be two middle scores. The
median is the average o f the two middle scores. To determine the ordinal positions of the two
middle scores, divide the total number of scores in the distribution by 2. The number value
obtained by that division and the number value that is one above it represent the ordinal positions
of the two middle scores. To illustrate, assume we have a distribution consisting of the following
six scores: 6, 8,9, 12, 13, 16. To determine the median, we initially divide 6 by 2 which equals
3. Thus, if we arrange the scores ordinally, the 3rd and 4th scores (since 3 + 1 = 4) are the middle
scores. The average of these scores, which are, respectively, 9 and 12, is the median (which will
be represented by the notation M). Thus, M = (9 + 12)/2 = 10.5. Note once again that in this
example (as was the case in the previous example involving an odd number of scores) the initial
values computed (3 and 4) do not themselves represent the median, but instead represent the
ordinal positions of the scores used to compute the median. As was the case with the mode, a
median value derived for a sample is a statistic, whereas the median of a whole population is a
parameter.
The mean The mean (also referred to as the arithmetic mean), which is the most
commonly employed measure of central tendency, is the average score in a distribution.
Typically, when the mean is used as a measure of central tendency, it is employed with interval
or ratio level data. Within the framework of the discussion to follow, the notation n will represent
the number of subjects or objects in a sample, and the notation A will represent the total number
of subjects or objects in the population from which the sample is derived.
Equation 1.1 is employed to compute the mean of a sample. E, which is the upper case
Greek letter sigma, is a summation sign. The notation Y X indicates that the set of n scores in the
sample/distribution should be summed.

Tv
X = —— (Equation 1.1)
n
Sometimes Equation 1.1 is written in the following more complex but equivalent form
containing subscripts and superscripts: X = E ”=lX f ln. In the latter equation, the notation
Y ni=xX i indicates that beginning with the first score, scores 1 through n (i.e., all the scores) are
to be summed. X. represents the score of the /'th subject or object.
Equation 1.1 will now be applied to the following distribution of five scores: 6, 8, 9, 13,
16. Since n = 5 and T X = 52, X = YX/n = 52/5 = 10.4.
Whereas Equation 1.1 describes how one can compute the mean of a sample, Equation 1.2
describes how one can compute the mean of a population. The simplified version without sub­
scripts is to the right of the first = sign, and the subscripted version of the equation is to the right
of the second = sign. The mean of a population is represented by the notation p, which is the
lower case Greek letter mu. In practice, it would be highly unusual to have occasion to compute
the mean of a population. Indeed, a great deal of analysis in inferential statistics involves
employing the mean of a sample to estimate a population mean.
8 Handbook o f Parametric and Nonparametric Statistical Procedures

„ = Z x = $ill —
X‘ (Equation 1.2)
N N

Where: n = The number of scores in the distribution


X f = The /th score in a distribution comprised of n scores
Note that in the numerator of Equation 1.2 all N scores in the population are summed, as
opposed to just summing n scores when the value of X is computed. The sample mean X
provides an unbiased estimate of the population mean p, which indicates that if one has a
distribution of n scores, X provides the best possible estimate of the true value of p. Later in the
book (specifically, under the discussion of the single-sample z test (Test 1)) it will be noted that
the mean of the sampling distribution of means (which represents the expected value of the
statistic represented by the mean) will be equal to the value of the population mean. (A sampling
distribution of means is a frequency distribution of sample means derived from the same
population, in which the same number o f scores is employed for each sample.) Recollect that
earlier in this chapter it was noted that an unbiased statistic is one with an expected value that
is equal to the parameter it is employed to estimate.This applies to the sample mean, since its
expected value is equal to the population mean.
The weighted mean There may be occasions when a researcher has mean values which
have been computed for two or more separate samples, yet one wants to compute an overall mean
based on all the sample means. In such a case the appropriate value computed for the overall
mean is referred to as a weighted mean. To illustrate the latter, consider the following situation:
A researcher has access to the following average IQ scores of students in a specific school
district: X { = 100 for n { = 100 students who attend School 1; X 2= 106 for n2 = 200 subjects
who attend School 2; and X 3 = 115 for n3 = 300 subjects who attend School 3. Assume that the
researcher wants to determine the average IQ for the total N = n { + n2 + n3 = 600 students in
the district. If the researcher has access to the scores of all 600 students, the latter value can
easily be computed through use of Equation 1.1 (i.e., sum all 600 scores and divide the latter sum
by 600). However, if one only has access to the mean score of each school, the latter computation
will not be possible. It would be incorrect to determine the average IQ for all of the students in
the district by computing the average of the average IQ scores computed for the three schools —
in other words, it would be incorrect to compute the overall mean as follows: (100 + 106 + 115)/3
= 107. Note that the latter computation weighs the contribution of each of the three schools
equally, and would not be justified unless there were an equal number of students enrolled in
each school. In our example, however, the enrollment for School 1 is one- half of that for School
2 and one-third of that for School 3, while the enrollment for School B is two-thirds of that for
School 3. Because of the unequal number of students in each school, the most accurate method
for computing an overall mean for the school district would be one which weighs each of the
three school means by the number of students in each sample. If there are k schools/groups, with
ng subjects per school/group, Equation 1.3 can be employed to compute a weighted mean
(represented by the notation X w). Note that when the latter equation is employed for the example
under discussion, the computed value X w = 109.5 is larger than the previously computed value
of 107. The larger value computed for the weighted mean derives from the fact that Equation 1.3
allocates greater weight to the mean o f School 2 relative to the mean of School 1, as well as
greater weight to the mean of School 3 relative to the means of both Schools 1 and 2. Use of
Equation 1.3 insures that in combining sample means to compute an overall mean, the latter value
will be proportionally influenced by the size of the samples used to compute each of the k sample
means. Additionally, the value computed for the weighted mean will be identical to the mean
Introduction 9

value that would be computed if Equation 1.1 had been employed to compute an overall mean
using the actual scores of the 600 students.1
- _ w ,) • (.,»*,) • • W .) (fv M m U |

- = (WiK^i) + (n2)(X2) + (n3)(Z3) = (100)(100) + (200)(106) + (300)(115) = m 5


w nx + n2 + n3 100 + 200 + 300

The geometric mean The geometric mean is a measure of central tendency which is
primarily employed within the context of certain types of analysis in business and economics. It
is most commonly used as an average of index numbers, ratios, and percent changes over time.2
The geometric mean (GM) of a distribution is the nth root of the product of the n scores in the
distribution. Equation 1.4 is employed to compute the geometric mean of a distribution.

GM = n]j X l,X 2, . . . , X n (Equation 1.4)

To illustrate the above equation, the geometric mean of the five values 2 ,5 ,1 5 ,2 0 and 30
is GM = V(2)(5)(15)(20)(30) = 6.18.
Only positive numbers should be employed in computing the geometric mean, since one or
more zero values will render GM= 0, and negative numbers will render the equation insoluble
(when there is an odd number of negative values) or meaningless (when there is an even number
of negative values). Specifically, let us assume we wish to compute the geometric mean for the
four values - 2 , - 2 , - 2 , - 2. Employing the equation noted above, GM= ^ ( - 2 )( - 2 )( - 2 )( - 2 ) = 2.
Obviously the latter value doesn’t make sense, since logically the geometric mean should have
a minus sign — specifically, GM should be equal to ­ 2. When all of the values in a distribution
are equivalent, the geometric mean and arithmetic mean will be equal to one another. In all
other instances, the value of the geometric mean will be less than the arithmetic mean. Note that
in the above example in which GM = 6.18, the value computed for the arithmetic mean is
X = 14.4, which is greater than the geometric mean.
Before the introduction of hand calculators, a computationally simpler method for
computing the geometric mean utilized logarithms (which are discussed in Endnote 15).
Specifically, the following equation can also be employed to compute the geometric mean:
log (GM) = ( £ log X ) l n . The latter equation indicates that the logarithm of the geometric mean
is equivalent to the arithmetic mean of the logarithm of the values of the scores in the
distribution. The antilogarithm of log (GM) will represent the geometric mean.
Chou (1989, pp. 107- 110) notes that when a distribution of numbers takes the form of a
geometric series or a logarithmically distributed series (which is positively skewed), the
geometric mean is a more suitable measure of central tendency than the arithmetic mean. (The
concept of skewness (which involves a disproportionate number of high or low scores) is
discussed later in this chapter.) A geometric series is a sequence of numbers in which the ratio
of any term to the preceding term is the same value. As an example, the series 2,4, 8,16,32,64,
.... represents a geometric series in which each subsequent term is twice the value of the
preceding term. A logarithmic series (also referred to as a power series) is one in which
successive terms are the result of a constant (a) multiplied by successive integer powers of a
variable (x) (e.g., ax , ax2, ax3, ..., ax''). Thus, if x =3, the series #3, a32, a33, a34 , ...., a3n
represents an example of a logarithmic series.
A major consequence of employing the geometric mean in lieu of the arithmetic mean is that
the presence of extreme values (which is often the case with a geometric series) will have less
10 Handbook o f Parametric and Nonparametric Statistical Procedures

of an impact on the value of the geometric mean. Although it would not generally be employed
as a measure of central tendency for a symmetrical distribution, the geometric mean can reduce
the impact of skewness when it is employed as a measure of central tendency for a non-
symmetrical distribution.
The harmonic mean Another measure of central tendency is the harmonic mean. The
harmonic mean is determined by computing the reciprocal of each of the n scores in a
distribution. (The reciprocal of a number is the value computed when the number is divided into
1— i.e., the reciprocal of X = MX.) The mean/average value o f the n reciprocals is then
computed. The reciprocal of the latter mean represents the harmonic mean, which is computed
with Equation 1.5.
n, = — - — (Equation 1.5)
n 11 1
;=1 jX.
E

Where: n = The number of scores in the distribution


X. = The /*hscore in a distribution comprised of n scores

To illustrate the above equation, the harmonic mean of the five values 2, 5, 15, 20 and 30
is computed below.

nh = ------------------------------------------------------ = — = 5.88
* (1/2) + (1/5) + (1/15) + (1/20) + (1/30) .85

Chou (1989, p. I l l ) notes that for any distribution in which there is variability among the
scores and in which no score is equal to zero, the harmonic mean willjilways be smaller than both
the arithmetic mean (which in the case of the above distribution is X = 14.4) and the geometric
mean (which is GM = 6.18). This is the case, since the harmonic mean is least influenced by
extreme scores in a distribution. Chou (1989) provides a good discussion of the circumstances
when it is prudent to employ the harmonic mean as a measure of central tendency. He notes that
the harmonic mean is recommended when scores are expressed inversely to what is required in
the desired measure of central tendency. Examples of such circumstances are certain conditions
where a measure of central tendency is desired for time rates and/or prices. Further discussion
of the harmonic mean can be found in Section VI of the t test for two independent samples
(Test 11) and Chou (1989, pp. 110-113).

Measures of Variability

In this section a number of measures of variability will be discussed. Primary emphasis, however,
will be given to the standard deviation and the variance, which are the most commonly
employed measures of variability.
a) The range The range is the difference between the highest and lowest scores in a
distribution. Thus in the distribution 2,3, 5, 6, 7, 12, the range is the difference between 12 (the
highest score) and 2 (the lowest score). Thus: Range = 12 - 2 = 10. Some sources add one to
the obtained value, and would thus say that the Range = 11. Although the range is employed on
occasion for descriptive purposes, it is of little use in inferential statistics.
b) Quantiles, percentiles, quartiles, and deciles A quantile is a measure that divides a
distribution into equidistant percentage points. Examples of quantiles are percentiles, quartiles,
and deciles. Percentiles divide a distribution into blocks comprised of one percentage point (or
blocks that comprise a proportion equal to .01 of the distribution).3 A specific percentile value
Introduction 11

corresponds to the point in a distribution at which a given percentage of scores falls at or below.
Thus, if an IQ test score of 115 falls at the 84thpercentile, it means 84% of the population has
an IQ of 115 or less. The term percentile rank is also employed to mean the same thing as a
percentile — in other words, we can say that an IQ score of 115 has a percentile rank of 84%.
Deciles divide a distribution into blocks comprised of ten percentage points (or blocks that
comprise a proportion equal to .10 of the distribution). A distribution can be divided into ten
deciles, the upper limits of which are defined by the 10th percentile, 20th percentile, ..., 90th
percentile, and 100thpercentile. Thus, a score that corresponds to the 10thpercentile falls at the
upper limit of the first decile of the distribution. A score that corresponds to the 20thpercentile
falls at the upper limit o f the second decile of the distribution, and so on. The interdecile range
is the difference between the scores at the 90thpercentile (the upper limit of the ninth decile) and
the 10thpercentile.
Quartiles divide a distribution into blocks comprised of 25 percentage points (or blocks that
comprise a proportion equal to .25 of the distribution). A distribution can be divided into four
quartiles, the upper limits of which are defined by the 25th percentile, 50th percentile (which
corresponds to the median of the distribution), 75thpercentile, and 100thpercentile. Thus, a score
that corresponds to the 25th percentile falls at the upper limit of the first quartile of
the distribution. A score that corresponds to the 50th percentile falls at the upper limit of the
second quartile of the distribution, and so on. The interquartile range is the difference between
the scores at the 75th percentile (which is the upper limit of the third quartile) and the 25th
percentile.
Infrequently, the interdecile or interquartile ranges may be employed to represent variability.
An example of a situation where a researcher might elect to employ either of these measures to
represent variability would be when one wishes to omit a few extreme scores in a distribution.
Such extreme scores are referred to as outliers. Specifically, an outlier is a score in a set of data
which is so extreme that, by all appearances, it is not representative of the population from which
the sample is ostensibly derived. Since the presence of outliers can dramatically affect variability
(as well as the value of the sample mean), their presence may lead a researcher to believe that the
variability of a distribution might best be expressed through use of the interdecile or interquartile
range (as well as the fact that when outliers are present, the sample median is more likely than
the mean to be a representative measure of central tendency). Further discussion o f outliers can
be found later in this chapter, as well as in Section VI of the t test for two independent samples.
c) The variance and the standard deviation The most commonly employed measures
of variability in both inferential and descriptive statistics are the variance and standard devi­
ation. These two measures are directly related to one another, since the standard deviation is the
square root of the variance (and thus the variance is the square of the standard deviation). As is
the case with the mean, the standard deviation and variance are generally only employed with
interval or ratio level data.
The formal definition of the variance is that it is the mean of the squared difference
scores (which are also referred to as deviation scores). This definition implies that in order to
compute the variance of a distribution one must subtract the mean of the distribution from each
score, square each o f the difference scores, sum the squared difference scores, and divide the
latter value by the number of scores in the distribution. The logic of this definition is reflected
in the definitional equations which will be presented later in this section for both the variance
and standard deviation.
A definitional equation for a statistic (or parameter) contains the specific mathematical
operations that are described in the definition of that statistic (or parameter). On the other hand,
a computational equation for the same statistic (or parameter) does not clearly reflect the
definition of that statistic (or parameter). A computational equation, however, facilitates
12 Handbook o f Parametric and Nonparametric Statistical Procedures

computation of the statistic (or parameter), since it is computationally less involved than the
definitional equation. In this book, in instances where a definitional and computational equation
are available for computing a test statistic, the computational equation will generally be employed
to facilitate calculations. It should be noted that because they do not directly reflect the meaning
of the concept they measure, some sources omit and/or discourage the use of computational
equations. Although the latter philosophy can be understood if one always has access to a
computer, in instances where one is required to conduct computations by hand or with the aid of
a calculator, the use of computational equations can make an analysis less burdensome. The
following notation will be used in the book with respect to the values of the variance and
standard deviation.
o2 (where o is the lower case Greek letter sigma) will represent the variance of a popu­
lation.
s 2 will represent the variance of a sample, when the variance is employed for descriptive
purposes, s 2 will be a biased estimate of the population variance a 2, and because of this s 2
will generally underestimate the true value of o2.
s 2 will represent the variance of a sample, when the variance is employed for inferential
purposes, s 2 will be an unbiased estimate of the population variance o2.
a will represent the standard deviation of a population.
s will represent the standard deviation of a sample, when the standard deviation is
employed for descriptive purposes, s will be a biased estimate of the population standard
deviation a and because of this s will generally underestimate the true value of o.
s will represent the standard deviation of a sample, when the standard deviation is
employed for inferential purposes, s will be an unbiased estimate of the population standard
deviation a.4
Equations 1.6—1.11 are employed to compute the values a 2, s 2, s 2, g, s , and s. Note that
in each case, two equivalent methods are presented for computing the statistic or parameter in
question. The formula to the left is the definitional equation, whereas the formula to the right is
the computational equation.

W2 g -T )2
. S lY - r f = __________N _
(Equation 1.6)
N N

= T (X - X ? = _________ n
(Equation 1.7)
n n

~ yjc2 ­
S 2 = E(X - X ) 2 = _________ n _
(Equation 1.8)
n - 1 n - 1

--------------------------- Y y 2 _ (H y )2

a _ H(X - p)2 _ N_
(Equation 1.9)
\ N \ N
Introduction 13

J*2 _ W

w
n

1
s = (Equation 1.10)

IS
n \ n

EX2 -
TXX - X f n
s = (Equation 1.11)
n - 1 \ n - 1

The reader should take note of the following with respect to the notation employed in
Equations 1.6—1.11: a) The notation E X 2 represents the sum of the X 2 scores. It indicates that
each of the n X scores (i.e., each of the n scores which comprise the distribution) is squared, and
the sum of the n X 2 scores is obtained; b) The notation ( E X ) 2 represents the sum of the X
scores squared. It indicates that asum is obtained for the n X scores, and the latter sum is then
squared; c) The notation E (A - X ) 2 indicates that the mean of the distribution is subtracted
from each of the n scores, each of the n difference scores is squared, and the sum of the n
squared difference scores is obtained.
When the variance or standard deviation of a sample is computed within the framework of
an inferential statistical test, one always wants an unbiased estimate o f the population variance
or the population standard deviation. Thus, the computational form of Equation 1.8 will be em­
ployed throughout this book when a sample variance is used to estimate a population variance,
and the computational form of Equation 1.11 will be employed when a sample standard deviation
is used to estimate a population standard deviation.
The reader should take note of the fact that some sources employ subscripted versions of
the above equations. Thus, the computational form of Equation 1.8 is often written as:
2

T .X ,
/=1
E - v ,2 -
,~2 = —
n - 1

Although the subscripted version will not be employed for computing the values of s 2 and
s, subscripted versions for some statistics may be used later in the book in order to clarify the
mathematical operations involved in computing a statistic.
As noted previously, for the same set of data the value of s 2 will always be larger than the
value of s 2. This can be illustrated with a distribution consisting of the five scores: 6, 8, 9, 13,
16. The following values are substituted in Equations 1.7 and 1.8: E X = 52, E X 2 = 606, n = 5.

606 - ^
s 2 = ------------ — = 13.04
5

606 - ^
5 2 = ------------ — =
16.3
5 -
1

Since the standard deviation is the square root of the variance, we can quickly determine
that s = yjs2 = \/13.04 = 3.61 and s = sjs2 = / l 6 3 = 4.04. Note that s 2 > s 2 and s > s.5
14 Handbook o f Parametric and Nonparametric Statistical Procedures

Table 1.2 summarizes the computation of the unbiased estimate of the population variance
(5 2) , employing both the definitional and computational equations. Note that in the two versions
for computing s 2 listed for Equation 1.8, the numerator values Y X 2 - [(SIX)2In] and Y (X - X f
are equivalent. Thus, in Table 1.2, the sum of the values of the last column Y ( X - X ) 2 = 65.20,
equals Y X 2 - [(YX)2/n] = 606 - [(52)2/5] = 65.20.

Table 1.2 Computation of Estimated Population Variance

X X2 X (X - X ) (X - X)2
6 36 10.4 (6 - 10.4) = - 4.4 (_4.4)2 = 19.36
8 64 10.4 (8 - 10.4) = - 2.4 (-2.4)2 = 5.76
9 81 10.4 (9 - 10.4) = - 1.4 ( - 1.4)2 = 1.96
13 169 10.4 (1 3 - 1 0 .4 ) = 2.6 (2.6)2 = 6.76
16 256 10.4 (1 6 - 1 0 .4 ) = 5.6 (5.6)2 = 31.36

YX = 52 Y X 2 = 606 Y(X - X) = 0 Y(X - X)1 = 65.20

I X 2 - 606 - <52>2
n 5 _ 65.20
V2
O = 16.3
n - 1 5 -1 5 -1

p S(Af - i ) 2 = 65.20 = 163


n - 1 5 -1

The reader should take note of the following with respect to the standard deviation and
variance:6
1) The value of a standard deviation or variance can never be a negative number. If a
negative number is ever obtained for either value, it indicates a mistake has been made in the cal­
culations. The only time the value of a standard deviation or variance will not be a positive
number is when its value equals zero. The only instance in which the value of both the standard
deviation and variance of a distribution will equal zero is when all of the scores in the distribution
are identical to one another.
2) As the value of the sample size {n) increases, the difference between the values of s 2 and
s 2 will decrease. In the same respect, as the value of n increases, the difference between the
values of s and s will decrease. Thus, the biased estimate of the variance (or standard deviation)
will be more likely to underestimate the true value of the population variance (or standard
deviation) with small sample sizes than it will with large sample sizes.
3) The numerator o f any of the equations employed to compute a variance or a standard
deviation is often referred to as the sum of squares. Thus in the example in this section, the
value of the sum of squares is 65.2, since Y X 2 - [(YX)2lri\ = 606 - [(52)2/5] = 65.2. The
denominators of both Equation 1.8 and Equation 1.11 are often referred to as the degrees of
freedom (a concept that is discussed later in the book within the framework of the single-sample
t test (Test 2)). Based on what has been said with respect to the sum of squares and the degrees
of freedom, the variance is sometimes defined as the sum of squares divided by the degrees of
freedom.
Prior to closing the discussion of the standard deviation and variance, a number of other
characteristics of statistics will be noted. Specifically, the concepts of efficiency, sufficiency,
and consistency will be discussed. A more in depth discussion of these concepts can be found
in Hays and Winkler (1971).
Introduction 15

The concept o f efficiency is closely related to the issue of whether or not a statistic is
biased. An efficient statistic is one that provides a more accurate estimate of a parameter relative
to an alternative statistic that can also be employed to estimate the same parameter. An example
of this is the relative degree to which the mean and median accurately estimate the mean of a
symmetrical population distribution. (In a symmetrical population distribution the values of the
mean and median will always be identical.) Although both the sample mean and sample median
represent unbiased estimators of the mean of a symmetrical population distribution, the sample
mean will generally be a more efficient estimator than the sample median, since in most instances
the sample mean will have a smaller standard error associated with it. The standard error
(which is discussed in detail under the discussion of the single-sample z test) is a measure of
variability. Put simply, for a given sample size, the lower the value of the standard error of the
mean relative to the standard error of the median, the less variability there will be among a set
of sample means relative to the amount of variability among an equal number of sample medians.
In other words, values computed for sample means will cluster more closely around the true
population mean than will values computed for sample medians. In the same respect, in most
instances the sample variance and sample standard deviation (when computed with Equations 1.8
and 1.11) will also be efficient statistics, since they will have lower standard errors than
alternative measures of variability.
The mean, variance, and standard deviation also represent sufficient estimators of the
parameters they estimate. A sufficient estimator is one that employs all of the information in the
sample to estimate the relevant parameter. The mode and median are not sufficient estimators,
since the mode only employs the most frequently occurring score in a sample, while the median
only employs the middle score(s). With regard to variability, the range and interquartile range
values are not sufficient estimators, since they only employ specific scores in a sample.
Finally, the mean, variance, and standard deviation also represent consistent estimators.
A consistent estimator is characterized by the fact that as the sample size employed to compute
the statistic increases, the likelihood of accurately estimating the relevant parameter also
increases. The latter is true for all three of the aforementioned statistics.
d) The coefficient of variation An alternative, although infrequently employed measure
of variability, is the coefficient of variation. Since the values of the standard deviation and
variance are a direct function of the magnitude of the scores in a sample/population, it can
sometimes be useful to express variability in reference to the size of the mean of a distribution.
By doing the latter, one can compare the values of the standard deviations and variances of
distributions that have dramatically different means and/or employ different units of measure­
ment. The coefficient of variation (represented by the notation CV) allows one to do this. The
coefficient of variation is computed with Equation 1.12.

CV - 4z (Equation 1.12)
X

The following should be noted with respect to Equation 1.12: a) When the values of o and
p are known, they can be employed in place of s and X ; and b) Sometimes the value computed
for CV is multiplied by 100 in order to express it as a percentage.
Note that the coefficient of variation is nothing more than a ratio of the value of the
standard deviation relative to the value of the mean. The larger the value of CV computed for a
variable, the greater the degree of variability there is on that variable. Unlike the standard
deviation and variance, the numerical value represented by CV is not in the units that are
employed to measure the variable for which it is computed.
16 Handbook o f Parametric and Nonparametric Statistical Procedures

To illustrate the latter, let us assume that we wish to compare the variability of income
between two countries which employ dramatically different values of currency. The mean
monthly income in Country A is XA = 40 jaspars, with a standard deviation of sA = lOjaspars.
The mean monthly income in Country B is X B = 2000 rocs, with a standard deviation of
sB = 100 rocs. Note that the mean and standard deviation for each country are expressed in the
unit of currency employed in that country. When we employ Equation 1.12, we compute that the
coefficient of variations for the two countries are CVA = 10/40 = .25 and CVB - 100/2000 = .05.
The latter CV values are just simple ratios, and are not numbers based on the scale for the unit
of currency employed in a given country. In other words, CVA = .25 is not .25 jaspars, but is
simply the ratio .25. In the same respect CVB - .05 is not .05 rocs, but is simply the ratio .05.
Consequently, by dividing the larger value CVA = .25 by the smaller value CVB = .05 we can
determine that there is five times more variability in income in Country A than there is in
Country B (i.e., CVA/CVB = .25/.05 = 5). If we express our result as a percentage, we can
say that there is 5 x 100% = 500% more variability in income in Country A than there is in
Country B. If, on the other hand, we had divided sB = 100 rocs by sA = 10 jaspars (i.e.,
sB/sA = 100/10 = 10), we would have erroneously concluded that there is ten times (or
10 x 100 = 1000%) more variability in income in Country B than in Country A. The reason
why the latter method results in a misleading conclusion is that, unlike the coefficient of
variation, it fails to take into account the different units of currency employed in the two
countries.

Measures of Skewness and Kurtosis

In addition to the mean and variance, there are two other measures that can provide useful
descriptive information about a distribution. These two measures, skewness and kurtosis, rep­
resent the third and fourth moments of a distribution. Hays and Winker (1971, p. 161) note that
the term moment is employed to represent the expected values of different powers of a random
variable. Equations 1.13 and 1.14, respectively, represent the general equation for a moment. In
Equation I.13,v/ (where v represents the lower case Greek letter nu) represents the population
parameter for the /thmoment about the mean, whereas in Equation 1.14, mt represents the sample
statistic for the /thmoment about the mean.

v/ = n ^ (Equation 1.13)

m . = S x ~ *)' (Equation 1.14)


n
With respect to a sample, the first moment about the mean (m j) is represented by
Equation 1.15. The second moment about the mean ( m2, which is the sample variance) is repre­
sented by Equation 1.16. The third moment about the mean (ra3, which as noted above
represents skewness and is also referred to as symmetry) is represented by Equation 1.17. The
fourth moment about the mean ( m4, which as noted above represents kurtosis) is represented
by Equation 1.18.

m, = —— = 0 (Equation 1.15)
n

m = H x - X !—

f
(Equation 1.16)
n
Introduction 17

m = (Equation 1.17)
n

= £(,V .V)4 (Equation 1.18)


n
Although skewness and kurtosis are not employed for descriptive purposes as frequently
as the mean and variance, they can provide useful information. Skewness and kurtosis are some­
times employed within the context of determining the goodness-of-fit of data in reference to a
specific type of distribution — most commonly the normal distribution. Tests of goodness-of­fit
are discussed under the single-sample test for evaluating population skewness (Test 4), the
single-sample test for evaluating population kurtosis (Test 5), the Kolmogorov-Smirnov
goodness-of-fit test for a single sample (Test 7), and the chi-square goodness-of-fit test
(Test 8).
Skewness Wuensch (2005, p. 1855; 2011) notes that the terms “skewed” and “askew” in
everyday language refer to something that is out of line or distorted to one side. Consequently,
skewness is a measure reflecting the degree to which a distribution is asymmetrical. A
symmetrical distribution will result in two identical mirror images when it is split down the
middle. The bell shaped or normal distribution, which will be discussed later in this chapter,
is the best known example o f a symmetrical distribution. When a distribution is not symmetrical,
a disproportionate number of scores will fall either to the left or right of the middle of the
distribution. Figure 1.2 visually depicts three frequency distributions (within the format of
frequency polygons), only one of which, Distribution B, is symmetrical. Distributions C and
D are asymmetrical — to be more specific, the latter two distributions are skewed.
When all the lines connecting the points in a frequency polygon are “smoothed over,” the
resulting frequency distribution assumes the appearance of the three distributions depicted in
Figure 1.2. A theoretical frequency distribution (or as it is sometimes called, a theoretical
probability distribution), which any of the distributions in Figure 1.2 could represent, is a graph
of the frequencies for a population distribution. (The distributions in Figure 1.2 can also represent
a frequency distribution which can be constructed through use of a specific mathematical
equation (such as Equation 1.36, the equation for the normal distribution) which is discussed later
in this chapter.) As noted earlier in this section, the A-axis represents the range of possible scores
on a variable in the population, while the 7-axis represents the frequency with which each of the
scores occurs (or sometimes the proportion/probability of occurrence for the scores is recorded
on the 7-axis — thus the use of the term theoretical probability distribution). It should be
noted that it is more precise to refer to the values recorded on the ordinate of a theoretical
probability distribution as density values, and because of the latter the term probability density
function is often used to describe a theoretical probability distribution. Further clarification of
the concept of density and probability density functions can be found later in this section as well
as in Section IX (the Addendum) of the binomial sign test for a single sample (Test 9) under
the discussion of Bayesian analysis of a continuous variable.7
Returning to Figure 1.2, Distribution B is a unimodal symmetrical distribution.
Although it is possible to have a symmetrical distribution that is multimodal (i.e., a distribution
that has more than one mode), within the framework of the discussion to follow it will be
assumed that all of the distributions discussed are unimodal. Note that the number of scores in
the left and right tail of Distribution B are identical. The tail of a distribution refers to the
upper/right and lower/left extremes of the distribution. When one tail is heavier than another tail
it means that a greater proportion of the scores fall in that tail. In Distribution B the two tails
are equally weighted.
18 Handbook o f Parametric and Nonparametric Statistical Procedures

Mode

Distribution B
(Symmetrical distribution)

Distribution C Distribution D
(Negatively skewed distribution) (Positively skewed distribution)

Figure 1.2 Symmetrical and Asymmetrical Distributions

Turning to the other two distributions, we can state that Distribution C is negatively
skewed (or as it is sometimes called, skewed to the left), while Distribution D is positively
skewed (or as it is sometimes called, skewed to the right). Note that in Distribution C the bulk
of the scores fall in the right end of the distribution. This is the case, since the “hump” or upper
part of the distribution falls to the right. The tail or lower end of the distribution is on the left side
(thus the term skewed to the left). Distribution D, on the other hand, is positively skewed,
since the bulk of the scores fall in the left end of the distribution. This is the case, since the
“hump” or upper part of the distribution falls to the left. The tail or lower end of the distribution
is on the right (thus the term skewed to the right). It should be pointed out that Distributions
C and D represent extreme examples of skewe distributions. Thus, distributions can be
characterized by skewness, yet not have the imbalance between the left and right tails depicted
for Distributions C and D.
As a general rule, based on whether a distribution is symmetrical, skewed negatively, or
skewed positively, one can make a determination with respect to the relative magnitude of the
three measures of central tendency discussed earlier in this chapter. In a perfectly symmetrical
unimodal distribution the mean, median, and mode will always be the same value. In a skewed
distribution the mean, median, and mode will not be the same value. Typically (although there
are exceptions), in a negatively skewed distribution, the mean is the lowest value followed by the
median and then the mode, which is the highest value. The reverse is the case in a positively
skewed distribution, where the mean is the highest value followed by the median, with the mode
being the lowest value. The easiest way to remember the arrangement of the three measures of
Introduction 19

central tendency in a skewed distribution is that they are arranged alphabetically moving in from
the tail of the distribution to the highest point in the distribution.
Since a measure of central tendency is supposed to reflect the most representative score for
the distribution (although the word “tendency” implies that it may not be limited to a single
value), the specific measure of central tendency that is employed for descriptive or inferential
purposes should be a function of the shape of a distribution. In the case of a unimodal
distribution that is perfectly symmetrical, the mean (which will always be the same value as the
median and mode) will be the best measure of central tendency to use, since it employs the most
information. When a distribution is skewed, it is often preferable to employ the median as a
measure of central tendency in lieu of the mean. Other circumstances where it may be more
desirable to employ the median rather than the mean as a measure of central tendency are
discussed in Section VI of the t test for two independent samples under the discussion of
outliers and data transformation.
A simple method of estimating skewness for a sample is to compute the value sk, which
represents the Pearsonian coefficient of skewness (developed in the 1890s by the English
statistician Karl Pearson). Equation 1.19 is employed to compute the value of sk, which for the
distribution summarized in Table 1.2 is computed to be sk = 1.04. The notation M in Equation
1.19 represents the median of the sample.

= M jiM . 3(10-4 : ,9) = , ,04 (Equation L19)


s 4.04

The value of sk will fall within the range - 3 to +3, with a value of 0 associated with a
perfectly symmetrical distribution. Note that when X > M, sk will be a positive value, and the
larger the value o f sk, the greater the degree of positive skew. When X < M, sk will be a
negative value, and the larger the absolute value of sk, the greater the degree of negative skew.
Note that when X = M, which will be true if a distribution is symmetrical, sk = 0.8
To illustrate the above, consider the following three distributions E, F, and G, each of
which is comprised of 10 scores. Distribution E is symmetrical, Distribution F is negatively
skewed, and Distribution G is positively skewed. The value of sk is computed for each dis­
tribution.
1) Distribution E: 0, 0, 0, 5, 5, 5, 5, 10, 10, 10

The following sample statistics can be computed for Distribution E: X E = 5; Me = 5;


sE = 4.08; skE = [3(5 - 5)]/4.08 = 0. The value skE = 0 indicates that Distribution E is
symmetrical. Consistent with the fact that it is symmetrical is that the values of the mean and
median are equal. In addition, since the scores are distributed evenly throughout the distribution,
both tails are identical in appearance.

2) Distribution F: 0, 1, 1, 9, 9, 10, 10, 10, 10, 10

The following sample statistics can be computed for Distribution F : X F = 7; MF = 9.5;


sE - 4AQ;skF = [3(7 - 9.5)]/4.40 = - 1.70. The negative value skF = ­ 1.70 indicates that
Distribution F is negatively skewed. Consistent with the fact that it is negatively skewed is that
the value of the mean is less than the value of the median. In addition, the majority of the scores
(i.e., the hump) fall in the right/upper end of the distribution. The lower end of the distribution
is the tail on the left side.

3) Distribution G: 0, 0, 0, 0, 0 , 1, 1, 9, 9, 10
20 Handbook o f Parametric and Nonparametric Statistical Procedures

The following sample statistics can be computed for Distribution G: X G - 3; MG = .5;


sG = 4.40; skG = [3(3 - .5)]/4.40 = 1.70. The positive value skG = 1.70 indicates that
Distribution G is positively skewed. Consistent with the fact that it is positively skewed is that
the value of the mean is greater than the value of the median. In addition, the majority of the
scores (i.e., the hump) fall in the left/lower end of the distribution. The upper end of the dis­
tribution is the tail on the right side.
The most precise measure of skewness employs the exact value of the third moment about
the mean, designated earlier as m3. (Wuensch (2005, p. 1855; 2011) notes that the latter definition
of skewness is sometimes referred to as Fisher’s skewness.) Cohen (1996) and Zar (1999) note
that the unbiased estimate of the population parameter estimated by m3 can be computed with
either Equation 1.20 (which is the definitional equation) or Equation 1.21 (which is a
computational equation).

n Y (X - X f /T, T
m3 = -----------------— (Equation 1.20)
(n - 1)(n - 2)

2(LX)3
nZX3 - 3EXEX2 +
n — (Equation 1.21)
in - 1)(n - 2)

Note that in Equation 1.20, the notation Y ( X - X ) 3 indicates that the mean is subtracted
from each of the n scores in the distribution, each difference score is cubed, and the n cubed
difference scores are summed. The notation Y X 3 in Equation 1.21 indicates each of the n scores
is cubed, and the n cubed scores are summed. The notation (HT)3 in Equation 1.21 indicates that
the n scores are summed, and the resulting value is cubed. Note that the minimum sample size
required to compute skewness is n = 3, since any lower value will result in a zero in the de­
nominators of Equations 1.20 and 1.21, rendering them insoluble.
Since the value computed for m3 is in cubed units, the unitless statistic g , , which is an esti­
mate of the population parameter y, (where y represents the lower case Greek letter gamma),
is commonly employed to express skewness. The value of g, (which is often referred to as a
skewness coefficient) is computed with Equation 1.22. Readers should take note o f the fact that
most computer software (e.g., SPSS) prints out the value of g, to represent the skewness of a
distribution.
g, = — (Equation 1.22)
s3
When a distribution is symmetrical (about the mean), the value of g, will equal 0. When
the value o f g, is significantly above 0, a distribution will be positively skewed, and when it is
significantly below 0, a distribution will be negatively skewed. Although the normal distribution
is symmetrical (with g, = 0 ) , as noted earlier, not all symmetrical distributions are normal. In
other words, although a normal distribution will have a skewness coefficient of g, = 0, not all
distributions with the latter skewness coefficient are normal. Examples of nonnormal
distributions that are symmetrical are the t distribution and the binomial distribution, when
7i, = .5 (the meaning of the notation 7t, = .5 is explained in Section I of the binomial sign test
for a single sample). Wuensch (2005, p. 1856; 2011) notes that high values for skewness (which
theoretically can range from - to + °°) should alert a researcher to investigate for the presence
of outliers.
Zar (1999) notes that a population parameter designated (where (3represents the lower
case Greek letter beta) is employed by some sources (e.g., D’Agostino (1970, 1986) and
Introduction 21

D’Agostino et al. (1990)) to represent skewness. Equation 1.23 is used to compute ^~b{, which
is the sample statistic employed to estimate the value of

— {n ­ 2)g,
Jb[ = - (Equation 1.23)
\]n{n - 1)

When a distribution is symmetrical, the value of sj Y l will equal 0. When the value of sj~bx
is significantly above 0, a distribution will be positively skewed, and when it is significantly
below 0, a distribution will be negatively skewed. The method for determining whether a g,
and/or value deviates significantly from 0 is described under the single-sample test for
evaluating population skewness. The results of the latter test, along with the results of the
single-sample test for evaluating population kurtosis, are used in the D’Agostino-Pearson
test of normality (Test 5a) and the Jarque-Bera test of normality (Test 5b), both of which
are employed to assess goodness-of-fit for normality (i.e., whether sample data are likely to have
been derived from a normal distribution). When a distribution is normal, both g x and^T^ will
equal 0.
Table 1.3 Computation of Skewness for Distribution E
x X2 X3 X (X - X) (X - X)2 (X - x f
0 0 0 5 -5 25 -125
0 0 0 5 -5 25 -125
0 0 0 5 -5 25 -125
5 25 125 5 0 0 0
5 25 125 5 0 0 0
5 25 125 5 0 0 0
5 25 125 5 0 0 0
10 100 1000 5 5 25 0
10 100 1000 5 5 25 125
10 100 1000 5 5 25 125
Sums: EX = 50, E x 2 = 400, EX 3 = 3500, E(X - X) = 0, E(X - X)2 = 150, E(X -

O
11
EX2 - (EX)2 400 - <50>2
- EX 50 n I® _ ‘/jt.V
Q/O
8
a e -------=
n ’ io " n - 1 J\ 10 - 1

nE(X - X f _ (10)(0) 0J
1"... - V
3e (n - \){n - 2) (10 - 1)(10 -- 2)

nZX3 - 3EXEX2 + 2(EA')3 (10)(3500) - (3)(50)(400) + (2X50)3


n
m*E =0
1

(10 - 1)(10 - 2)
S'

w
S

m,
= 0 =o rr - (w - 2)^>, _ (10 - 2X0) _ ft
8 hlE = ~-
3T (4.08)3
SE v M " - 1) 1/(10(10 -
1)

At this point employing Equations I.20/I.21,1.22, and 1.23, the values of m3, g x, and ^ b x
will be computed for Distributions E, F, and G discussed earlier in this section. Tables 1.3—1.5
summarize the computations, with the following resulting values: m3 = 0, m3 = ­ 86.67,
w3 = 86.67, gj = 0, g. = - 1.02, g, = 1.02, and .[b7 = 0, xfb 7 = ­ .86, .[b7 = .86.
G lE lF *G V lE V V V g
22 Handbook o f Parametric and Nonparametric Statistical Procedures

The use o f as a measure of skewness will be further utilized in the chapter on the
single-sample test for evaluating population skewness when the value of is employed to
evaluate a hypothesis about the skewness of an underlying population distribution. The measure
of skewness readers are most likely to encounter in other sources (as well as in most computer
software packages) will be the value of g x computed with Equation 1.22, which represents a
standardized unitless/dimensionless measure of skewness (which translates into the fact that
you can add, subtract, divide, or multiply all values in the distribution by a constant and not
change the shape of the distribution).

Table 1.4 Computation of Skewness for Distribution F


X x2 *3 X {X - X) (X - X ) 2 (X-X)3
0 0 0 1 1 49 -343
1 1 1 1 -6 36 -216
1 1 1 7 -6 36 -216
9 81 729 7 2) 4 8
9 81 729 7 ) 4 8
10 100 1000 7 3 9 27
10 100 1000 7 3 9 27
10 100 1000 7 3 9 27
10 100 1000 7 3 9 27
10 100 1000 7 3 9 27
M

Sums: EX = 70, E X 2 = 664, E X 3 = 6460, E{X - X)


'R

- X)2 = 174, E ( X - X ) 3 = ­624


n
o

(D O 2
EX2 - ■ 664 - (70)2
n
O 1^
1o
II

II

=7 ^ 10 - 4 4 0
.........
n - 1 >| 10 - 1

nE(X - X)3 _ (10X-624) - _«£ A7


O

- 1X10 - 2)
i

n EX 3 - 3EXEX2 + 2(EX)3 (10)(6460) - (3)(70)(664) + (2)(70)3


m -
n 1 A
10 86 67
(n - 1)(n - 2) (10 - 1)(10 - 2)

^ . - 1.02 JIT - — - <10 - 2)( - L02> - -.86


F S3 (4.40)3 V F yjn(n- 1) /S lF T j

Readers will find that many sources state that Equation 1.24 is the equation for computing
the population skewness. The latter equation, in fact, also represents a standardized unitless/
dimensionless measure o f skewness.

Skewness = —— (Equation 1.24)


N o3

Depending upon the source, Equation 1.25,1.26, or 1.27 may be listed as the equation for
estimating the skewness of a population from sample data. Most sources state that the estimate
provided by Equation 1.25 will be biased (it underestimates the value of the population skewness)
for small sample sizes, and thus Equation 1.26 (which, in fact, is also biased) and Equation 1.27
(cited in Cohen (2001, p. 82)) may be employed to correct for bias. O f the three equations,
Equation 1.27 is the least biased, and will result in the largest value for skewness.
Introduction 23

The value for skewness is computed below for Distribution G with all three equations. In
the calculations conducted with respect to the latter distribution, the following values are
employed in Equations 1.25—1.27: a) E (X - X ) 3 =624 (the latter value is obtained by subtracting
the mean of the distribution (which is X = 3) from each of the n = 10 scores in the distribution
(yielding the 10 difference scores -3, -3, -3, -3, -3, -2, -2, 6, 6, 7), cubing each of the 10
difference scores (yieldingthe 10 cubed values - 3 3= - 27, - 3 3= - 27, - 3 3 = - 27, - 3 3= - 27, ­ 3 3
= - 27, - 2 3 = - 8 , - 2 3 = - 8 , 63 = 216,63= 216,73 = 343), and obtaining the sum of the 10 cubed
difference scores, which is 624); b) s = 4.40. Note that the value 1.017 computed with Equation
1.27 is closest to the value g l = 1.02 computed with Equation 1.22. The value computed for
skewness by SPSS and Minitab is 1.02. Readers should take note of the fact that Equations 1.25
and 1.26 will tend to underestimate population skewness unless an extremely large sample size
is employed. Data transformations that can be employed for reducing skewness are discussed in
Section VII (under outliers) of the t test for two independent samples. Further discussion of
skewness can be found in the chapter on the single-sample test for evaluating population
skewness.

Table 1.5 Computation of Skewness for Distribution G

X X2 X3 X (X - 20 {X - 202 (* -2 0 3
0 0 0 3 -3 9 -27
0 0 0 3 -3 9 -27
0 0 0 3 -3 9 -27
0 0 0 3 -3 9 -27
0 0 0 3 -3 9 -27
1 1 1 3 -2 4 -8
1 1 1 3 -2 4 -8
9 81 729 3 6» 36 216
9 81 729 3 6 36 216
10 100 1000 3 7 49 343

Sums: EX -= 30, EX2 = 264, EX3 = 2460, E( X- X) = 0, E(X - - x ? = 174, E ( X - X)3 = 624

D f2 - ^ 264 - (30)2
77 _ EX _ 30 . n 10
visi 3 s —.
a n 10 6 \ n - 1 \ 10 - 1

„ _ «E(3f - X f (10)(624) ­ 8A A7
5° (n - !)(« - 2) (10 - 1X10 - 2)

nLX1 - 3ZXZX2 * 2(EAr)3 (10)(2460) - (3)(30)(264) + (2)(30)3


n 10
1 A

- - 86.67
(n - lX/i - 2) (10 - 1)(10 - 2)
1

_ (10 - 2)(1.02) = .86


k

Si, = ^ = 86-67 = 1.02


11

G 53 (4.40)3 Y Jn(n - 1) V doxio - i)

Skewness = ^ = — -- = .732 (Equation 1.25)


ns3 (10)(4.40)3
24 Handbook o f Parametric and Nonparametric Statistical Procedures

Y jX - X f = _____________
624
Skewness = = .814 (Equation 1.26)
(n - 1)53 (10 - 1)(4.40)3

(Equation 1.27)
\ / \
Skewness =
n E (X - ^ ) 3] 10 \ 624
= 1.017
J

(N
\ n - 2} (n - l ) f 3 (10 - 1)(4.40)3J

1
Kurtosis According to D ’Agostino et a l (1990), the word kurtosis means curvature.
To be more specific, kurtosis provides information regarding the height o f a distribution relative
to the value o f its standard deviation. Wuensch (2005, p. 1028; 2011) states that in 1905 Karl
Pearson introduced the concept of kurtosis as a measure of how flat the top a symmetric
distribution is when compared with a normal distribution of the same variance. Wuensch (2005,
pp. 1028 -1029; 2011) also notes that: a) Although kurtosis is generally defined as a measure
reflecting the degree to which a distribution is peaked, the latter is not strictly true; b) Kurtosis
is more influenced by scores in the tails of a distribution than scores at the center of the
distribution; c) It is easy to confuse low kurtosis with high variance. However, distributions with
identical kurtosis can differ in variance, and distributions with identical variances can differ in
kurtosis (the latter author provides illustrations of this on his website).
The most common reason for measuring kurtosis is to determine whether or not data are
derived from a normally distributed population. Kurtosis is often described within the framework
of the following three general categories, all of which are depicted by representative frequency
distributions in Figure 1.3: mesokurtic, leptokurtic, and platykurtic.

Distribution H
(Mesokurtic distribution)

Distribution I Distribution J
(Leptokurtic distribution) (Platykurtic distribution)
Figure 1.3 Representative Types of Kurtosis

A mesokurtic distribution, which has a degree of peakedness that is considered moderate,


is represented by a normal distribution (i.e., the classic bell-shaped curve), which is depicted
in Figure 1.3. All normal distributions are mesokurtic, and the weight/thickness of the tails of a
normal distribution is in between the weight/thickness of the tails of distributions that are lepto­
kurtic or platykurtic. In Figure 1.3, Distribution H best approximates a mesokurtic distribution.
Introduction 25

A leptokurtic distribution is characterized by a high degree of peakedness. The scores


in a leptokurtic distribution tend to be clustered much more closely around the mean than they
are in either a mesokurtic or platykurtic distribution. Because of the latter, the value of the
standard deviation for a leptokurtic distribution will be smaller than the standard deviation for
the latter two distributions (if we assume the range of scores in all three distributions is
approximately the same). The tails of a leptokurtic distribution are heavier/thicker than the tails
of a mesokurtic distribution. In Figure 1.3, Distribution I best approximates a leptokurtic
distribution. Wuensch (2005, p. 1029; 2011) notes: a) Student’s/distribution (discussed in the
chapter on the single-sample t test) represents an example of a leptokurtic distribution that
approaches normality (i.e., a mesokurtic distribution) as the degrees of freedom for the
distribution increase — the concept of degrees of freedom (which increase as the size of a sample
increases) is discussed in the aforementioned chapter; b) skewed distributions are always
leptokurtic.
A platykurtic distribution is characterized by a low degree of peakedness. The scores in
a playtykurtic distribution tend to be spread out more from the mean than they are in either a
mesokurtic or leptokurtic distribution. Because of the latter, the value of the standard deviation
for a platykurtic distribution will be larger than the standard deviation for the latter two
distributions (if we assume the range of scores in all three distributions is approximately the
same). The tails of a platykurtic distribution are lighter/thinner than the tails of a mesokurtic
distribution. In Figure 1.3, Distribution J best approximates a platykurtic distribution.
Kurtosis could be viewed as the extent to which the scores in a distribution are dispersed
away from the shoulders of a distribution, where the shoulders are the points that demarcate the
standard deviation values z =± 1. More specifically, Moors (1986) defines kurtosis as the degree
of dispersion between the points marked off on the abscissa (A-axis) that correspond to p ± o.
Thus, with respect to the three types of distributions, we can make the statement that the range
of values on the abscissa that fall between the population mean (p) and one standard deviation
above and below the mean will be greatest for a platykurtic distribution and smallest for a
leptokurtic distribution, with a mesokurtic distribution being in the middle. As will be noted later
in this chapter, in the case o f a normal distribution (which, as noted earlier, will always be
mesokurtic), approximately 68% of the scores will always fall between the mean and one
standard deviation above and below the mean.
One crude way of estimating kurtosis is that if the standard deviation of a unimodal sym­
metrical distribution is approximately one-sixth the value of the range o f the distribution, the
distribution is mesokurtic. In the case of a leptokurtic distribution, the standard deviation will be
substantially less than one-sixth of the range, while in the case of a platykurtic distribution the
standard deviation will be substantially greater than one-sixth of the range. To illustrate, let us
assume that the range o f values on an IQ test administered to a large sample is 90 points (e.g., the
IQ scores fall in the range 55 to 145). If the standard deviation for the sample equals 15, the
distribution would be mesokurtic (since 15/90 = 1/6). If the standard deviation equals 5, the
distribution would be leptokurtic (since 5/90 = 1/18, which is substantially less than 1/6). If the
standard deviation equals 30, the distribution would be platykurtic (since 30/90 = 1/3, which is
substantially greater than 1/6).
A number of alternative measures for kurtosis have been developed, including one de­
veloped by Moors (1988) and described in Zar (1999). The latter measure computes kurtosis by
employing specific quantile values in the distribution. The most precise method for estimating
the kurtosis of a population can be computed through use of Equation 1.28 (which is a
definitional equation) or Equation 1.29 (which is a computational equation) which compute a
statistic to be designated k4. (Wuensch (2005, p. 1028; 2011) notes that the latter definition of
kurtosis is sometimes referred to as Fisher’s kurtosis.) The reader should take note of the fact
that k4 computed with Equations I.28/I.29 does not in actuality represent the value of the sample
26 Handbook o f Parametric and Nonparametric Statistical Procedures

fourth moment about the mean - i.e., it is not an estimate of mA (which is defined by Equation
1.18). The latter is the case because although k4 can assume a negative value, mA cannot (since
raising a number to the fourth power, which is the case in Equation 1.18, will always yield a
positive number).9
(Equation 1.28)

[[E (X - X)Xri){r> + !)]/(« - 1)] - 3[E(2f - X ) 2]2


*4
(n - 2)(« - 3)
(Equation 1.29)

k = (n 3 * n 2)T X 4 - 4 (» 2 + n)ZLT3E r - 3(»2 - n )(£ X 2)2 + 12n T X 2(TX)2 ­ 6(SA’)4


4 «(« - l)(n - 2)(« - 3)

Note that in Equation 1.28, the notation Y .(X - X ) 4 indicates that the mean is subtracted
from each of the n scores in the distribution, each difference score is raised to the fourth power,
and the n difference scores raised to the fourth power are summed. The notation E X 4 in
Equation 1.29 indicates each of the n scores is raised to the fourth power, and the n resulting
values are summed. The notation (E X )4 in Equation 1.29 indicates that the n scores are summed,
and the resulting value is raised to the fourth power. Note that the minimum sample size required
to compute kurtosis is n = 4, since any lower value will result in a zero in the denominators of
Equations 1.28 and 1.29, rendering them insoluble.
Since the value computed for k4 is in units of the fourth power, the unitless statistic g2,
which is an estimate of the population parameter y2, is commonly employed to express kurtosis.
The value of g2 is computed with Equation 1.30. When a distribution is mesokurtic the value of g2
will equal 0. When the value of g2 is significantly above 0, a distribution will be leptokurtic, and
when it is significantly below 0, a distribution will be platykurtic. Readers should take note of the
fact that most computer software (e.g., SPSS) prints out the value of g2 to represent the kurtosis
of a distribution.
k,
g2 = — (Equation 1.30)
s4

Zar (1999) notes that a population parameter designated P2 is employed by some sources
(e.g., Anscombe and Glynn (1983), D’Agostino (1986), and D’Agostino et al. (1990)) to repre­
sent kurtosis. Equation 1.31 is used to compute b2, which is the sample statistic employed to
estimate the value of p2. Some sources refer to g2 as a kurtosis coefficient, while other sources
employ b2 within the latter context.

(n - 2)(n - 3)g2 3(n - \) /T? 4.


b- = ----------------------- + —--------- (Equation 1.31)
2 (n + \){n - 1) n + 1

When a distribution is mesokurtic, the value of b2 will equal [3(n - 1)]/(n + 1). Inspection
of the latter equation reveals that as the value of the sample size increases, the value of b2
approaches 3. When the value computed for b2 is significantly below [3(n - 1)]/(« + 1), a
distribution will be platykurtic. When the value computed for b2 is significantly greater than
[3(n - 1)\/(n + 1), a distribution will be leptokurtic. The method for determining whether a g2
and/or b2 value is statistically significant is described under the single-sample test for evalu­
ating population kurtosis (the concept of statistical significance is discussed later in this
chapter). The results of the latter test, along with the results of the single-sample test for evalu-
Introduction 27

Table 1.6 Computation of Kurtosis for Distribution H

X X2 X3 X4 X (X -X ) (X -X )2 (X -X )4
2 4 8 16 10 -8 64 4096
7 49 343 2401 10 -3 9 81
8 64 512 4096 10 -2 4 16
8 64 512 4096 10 -2 4 16
8 64 512 4096 10 -2 4 16
9 81 729 6561 10 -1 1 1
9 81 729 6561 10 -1 1 1
9 81 729 6561 10 -1 1 1
10 100 1000 10000 10 0 0 0
10 100 1000 10000 10 0 0 0
10 100 1000 10000 10 0 0 0
10 100 1000 10000 10 0 0 0
11 121 1331 14641 10 1 1 1
11 121 1331 14641 10 1 1 1
11 121 1331 14641 10 1 1 1
12 144 1728 20736 10 2 4 16
12 144 1728 20736 10 2 4 16
12 144 1728 20736 10 2 4 16
13 169 2197 28561 10 3 9 81
18 324 5832 104976 10 8 64 4096

Sums: EX = 200, E X 2 = 2176, E X 2 = 25280, E X 4 =


314056
E(X - X) = 0, E(X - X) 2 = 176, £(X - X)4 =
8456

EX2 - (EX)2 (200)2


2176 -
M = 10 V = n 20 . .n .
H n 20 H ^ n - 1 ^ 20 - 1

_ [[£(X - X)\n){n + !)]/(« - 1)] - 3[E(X - X )2]2


k
% (n - 2)(n - 3)

[ [(8456)(20)(20 + l)]/(20 - 1)] - 3(176)2 _ ^


(20 - 2)(20 - 3)

(n 3 + n 2) EX4 - 4(n2 + n)EX3E X - 3 (n2 - n)(EX2)2 + 12n EX 2(EX)2 - 6(EX)4


k
% n{n - 1)(n - 2)(n - 3)

= [[(20)3 + (20)2](314056) - 4[(20)2 + 20](25280)(200) - 3[(20)2 ­ 20](2176)2

+ 12(20)(2176)(200)2 ■- 6(200)4] / [(20)(20 - 1)(20 -- 2)(20 - 3)] = 307.170


S?1 14^
II

= 307J7° = 3.596
(3.04)4
1

b - 3 ^ „ + 3(n - 1) . . (20 - 2)(20 - 3)(3.596) 3(20 - 1)


5.472
II

=
h n 1)(n (n + 1)
( + " 1) (20 + 1)(20 - 1) (20 + 1)
28 Handbook o f Parametric and Nonparametric Statistical Procedures

Table 1.7 Computation of Kurtosis for Distribution I


_ _ _
X X2 X3 X4 X (X-X) (X-X)2 (X-X)4
0 0 0 0 10 - 10 100 10000
1 1 1 1 10 -
9 81 6561
3 9 27 81 10 -
7 49 2401
3 9 27 81 10 -
7 49 2401
5 25 125 625 10 -
5 25 625
5 25 125 625 10 -
5 25 625
8 64 512 4096 10 -
2 4 16
8 64 512 4096 10 -
2 4 16
10 100 1000 10000 10 0 0 0
10 100 1000 10000 10 0 0 0
10 100 1000 10000 10 0 0 0
10 100 1000 10000 10 0 0 0
12 144 1728 20736 10 2 4 16
12 144 1728 20736 10 2 4 16
15 225 3375 50625 10 5 25 625
15 225 3375 50625 10 5 25 625
17 289 4913 83521 10 7 49 2401
17 289 4913 83521 10 7 49 2401
19 361 6859 130321 10 9 81 6561
20 400 8000 160000 10 10 10 10000

Sums: EX = 200, E X 2 = 2674, E X 3 = 40220, E X 4 =


649690
E(X - X) = 0, E(X - i )2 = 674, E(X - X)4 =
45290

EZ2
(EX)2 (200)2
- 2674 -
EX _ 200 20
= 10 Sj = = 5.96
n 20 \ n - 1 \ 20 - 1

k = [[E{X ~X)4(n)(n + ] ) ]/(” ~ ^ ~ 3[E(2f ­ X ) 2]2


4/ (« - 2)(n - 3)

= [[(45290)(20)(20 + l)]/(20 - 1)] - 3(674)2 _ _n


(20 - 2)(20 - 3)

kA =
= (^3 + n 2)EX4 - 4 (n2 + n)EX3EX - 3 (n2 - ^)(EX2)2 + 12nEX2(EX)2 ­ 6(E^)4
«(« - 1){n - 2)(n - 3)

= [[(20)3 + (20)2](649690) - 4[(20)2 + 20](40220)(200) - 3[(20)2 ­ 20](2674)2

+ 12(20)(2674)(200)2 - 6(200)4] / [(20)(20 - 1)(20 - 2)(20 - 3)] = ­1181.963

S = -1181.963 . . 939
' s? (5.96)4

^ ^ (.n - 2 )(n - 3)g2> ^ 3(w - (20 - 2)(20 - 3)( - .939) + 3(20 - 1) _ ,


2/ (n + 1)(n - 1) (n + 1) (20 + 1)(20 - 1) (20 + 1)
Introduction 29

ating population skewness, are used in the D’Agostino-Pearson test of normality and the
Jarque-Bera test of normality, both of which are employed to assess goodness-of-fit for
normality. As noted earlier, a normal distribution will always be mesokurtic, with g2= 0 and
b2 - 3. It should be noted, however, that although a normal distribution will have a kurtosis
coefficient of g2 = 0 or b2 = 3, not all distributions with the latter kurtosis coefficient are
normal.
At this point employing Equations I.28/I.29,1.30, and 1.31, the values of kA, g2, and b2will
be computed for two distributions to be designated H and I. The data for Distributions H and
I are designed (within the framework of a small sample size with n = 20) to approximate a
leptokurtic distribution and platykurtic distribution, respectively. Tables 1.6 and 1.7 sum ­
marize the computations, with the following resulting values: kA = 307.170, kA = - 1181.963,
g2 = 3.596, g2 = - .939, and b2 = 5.4 7 2 ,6 , = 1.994.
Lll LI LH ll
The use of g2 and b2 as measures o f kurtosis will be further utilized in the chapter on the
single-sample test for evaluating population kurtosis when the value of g2 is employed to
evaluate a hypothesis about the kurtosis of an underlying population distribution. The measure
of kurtosis readers are most likely to encounter in other sources (as well as in most computer
software packages) will be a value that is comparable to the value computed for g2with Equation
1.30, which represents a standardized unitless/dimensionless measure of kurtosis (which
translates into the fact that you can add, subtract, divide, or multiply all values in the distribution
by a constant and not change the shape of the distribution).
Readers will find that many sources state that Equation 1.32 is the equation for computing
the population kurtosis. The latter equation, in fact, also represents a standardized unitless/
dimensionless measure of kurtosis.

Kurtosis = (Equation 1.32)


N<54

Depending upon the source, Equation 1.33,1.34, or 1.35 may be listed as the equation for
estimating the kurtosis of a population from sample data. Most sources state that the estimate
provided by Equation 1.33 will be biased (it underestimates the value of the population kurtosis)
for small sample sizes, and thus Equation 1.34 (which, in fact, is also biased for small sample
sizes) and Equation 1.35 (cited in Cohen (2001, p. 84)) may be employed. O f the three equations,
Equation 1.35 is the least biased, and will result in the largest value for kurtosis.
The value for kurtosis is computed below for Distribution H with all three equations. In
the calculations conducted with respect to the_ latter distribution, the following values are
employed in Equations 1.33—1.35: a) £(2f - X ) 4_ = 8456 (the latter value is obtained by
subtracting the mean of the distribution (which is X = 10) from each of the n = 20 scores in the
distribution, raising each of the 20 difference scores (which are -8 , -3 , -2 , -2 , -2 , -1 - 1 ,- 1 ,
0 ,0 ,0 ,0 ,1 ,1 ,1 ,2 ,2 ,2 ,3 ,8 ) to the fourth power (the values of which are - 84 = 4096, - 34 = 81,
- 2 4 = 16, - 2 4 = 16, - 2 4 = 16, - 14 = 1, - l 4 = 1, - l 4 = 1, 04 = 0, 04 = 0, 04 = 0, 04 = 0, l 4 = 1, l 4
= 1, l 4 = 1, 24 = 16, 24 = 16, 24 = 16, 34 = 81, 84 = 4096), and obtaining the sum of the 20
difference scores raised to the fourth power, which is 8456; b) s = 3.04. Note that the value
3.613 computed with Equation 1.35 is closest to the value g2 = 3.596 computed with Equation
1.30. The value computed for kurtosis by SPSS and Minitab is 3.58.
Tabachnick and Fidell (2001, p. 2001, p. 73 - 74) note that most computer software packages
employ the value 0 to designate mesokurtosis. In the event Equations 1.33 or 1.34 (both of which,
as noted, underestimate the population kurtosis) are employed to compute kurtosis, one would
have to subtract 3 from the value computed for kurtosis with the latter two equations in order to
have a value analogous to (but most likely smaller than) the kurtosis value printed by computer
30 Handbook o f Parametric and Nonparametric Statistical Procedures

software. Since the value computed with Equation 1.35 has the latter subtraction built into it
(specifically, the second part of the equation involving the term preceded by the 3), the final
value computed with Equation 1.35 should be close to the kurtosis value printed by computer
software (which, in fact, is the case, since the value 3.613 computed with Equation 1.35 is quite
close to 3.58). Wuensch (2005, p. 1029; 2011) notes that: a) High values for kurtosis (which
theoretically can range from - °° to + <») should alert a researcher to investigate for the presence
of outliers in one or both tails of the distribution; and b) Kurtosis will usually only be of interest
to a researcher when a distribution under scrutiny is approximately symmetrical. Further
discussion of kurtosis can be found in the chapter on the single-sample test for evaluating
population kurtosis.
E(A - X f 8456
Kurtosis = = 4.950 (Equation 1.33)
ns4 (20)(3.04)4

£ (A - X )4 8456
Kurtosis = = 5.211 (Equation 1.34)
{n - l ) s 4 (20 - 1)(3.04)4

(Equation 1.35)
M
1

jr,,,,™;. - ( «(« + 1) ) _ 3f (n - l)(/i - 1)'


I (n ~ 2X« - 3)J , (n ~ l)s~4 in ~ 2)(n - 3)J
/
_ ( (20X20 h. 1) \ 8456 _ J (20 - 1)(20 - 1)
( (20 - 2X20 - 3)J (20 - 1)(3.04)4J { (20 - 2)(20 - 3)

Kurtosis = 7.152 - 3.539 = 3.613

Visual Methods for Displaying Data

Tables and graphs Earlier in this chapter the importance of employing tables and/or graphs in
describing data was discussed briefly. In the latter discussion it was noted that prior to conducting
a formal analysis on a set of data with an inferential statistical test, a researcher should examine
the data through use of some sort of systematic visual analysis. To be more specific, in any sort
of statistical analysis, be it descriptive or inferential, the researcher should evaluate the data with
respect to such things as the shape of a distribution, the most prudent measure of central tendency
to employ, the most appropriate criterion to employ for assessing variability, whether or not any
outliers are present, etc. Although it is not the intent of this section to provide exhaustive
coverage of the use of tables and graphs in summarizing data, a number of different methods of
visual presentation which have been employed by statisticians for many years will be presented.
(An excellent source on the use of graphs in summarizing data is The visual display o f data by
John Tufte (1983).) In addition, the subject of exploratory data analysis (EDA) will be
discussed. Exploratory data analysis (which was developed by the statistician John Tukey in
1977) is an alternative approach which many statisticians believe allows a researcher to
summarize data in a more meaningful way — more specifically, EDA is able to reveal
information that might not be obvious through use of the more conventional methods of graphic
and tabular display.

Commonly employed tables and graphs Distribution A below is the same set of data
presented earlier in this chapter (which is summarized by the frequency distribution in Table
Introduction 31

1.1 and the frequency polygon depicted in Figure 1.1). It will be employed to illustrate the
procedures to be described in this section. It will be assumed that the scores in Distribution A
represent a discrete variable which represents ratio level of measurement. (Earlier in this chapter
it was noted that a discrete variable can assume only a limited number of values.) With respect
to the variable of interest, we will assume that a score can only assume an integer value. The
mode, median, and mean of Distribution A are, respectively, 72,69, and 66.8 (with X = Y X =
1336/20 = 66.8). The values computed for the range, the second quartile (i.e., the 25th
percentile), the third quartile (i.e., the 75th percentile), and s 2 are, respectively, 74, 61.5, 73,
and 14.00. (The computation of the values for the 25th and 75th percentiles are demonstrated in
the latter part of this section within the context of the discussion of boxplots.)

Distribution A: 22, 55, 60, 61, 61, 62, 62, 63, 63, 67, 71, 71, 72, 72, 72, 74, 74, 76, 82, 96

Table 1.8 summarizes the data for Distribution A. The five columns which comprise Table
1.8 can be broken down into the following types of tables (each comprised of two of the five
columns) which are commonly employed in summarizing data: Columns 1 and 2 of Table 1.8
taken together constitute a grouped frequency distribution; Columns 1 and 4 taken together
constitute a relative frequency distribution; Columns 1 and 3 taken together constitute a
cumulative frequency distribution; Columns 1 and 5 taken together constitute a relative
cumulative frequency distribution.
At this point a variety of methods for summarizing data within a tabular and/or graphic
format will be described. With respect to the use of graphs in summarizing data, Tufte (1983)
notes that the following guidelines should be adhered to (which can also be applied in the
construction of tables): a) A graph should only be employed if it provides a reader with a better
understanding of the data; b) A graph should reflect the truth about a set of data in as clear and
simple a manner as possible; c) A graph should be clearly labeled, such that the reader can discern
the meaning of any relevant information displayed within it.
The grouped frequency distribution Earlier in this chapter (under the discussion of the
mode) the data for Distribution A were summarized through use of Table 1.1 and Figure 1.1.
Referring back to Table 1.1, note that it is comprised of two columns. The left column contains
each of the scores within the range of scores between 22 - 96 which occurs at least one time in
the distribution. The right column contains the frequency for each of the scores recorded in the
left column. At this point it should be emphasized that tables (as well as graphs) are employed to
summarize data as succinctly as possible. To be more specific, to make data more intelligible, a
table or graph should organize it in a way such that the relevant structural characteristics of the
data are clearly delineated. Under certain conditions (e.g., when a broad range of values charac­
terizes the scores the relevant variable may assume) it may be prudent to employ a grouped
frequency distribution.
A grouped frequency distribution is a frequency distribution in which the scores have been
grouped into class intervals. A class interval is a set of scores which contains two or more scores
that fall within the range of scores for the relevant variable. A basic principle which should be
adhered to in constructing a grouped frequency distribution is that it should summarize the
information contained in a set o f data without any loss of relevant information. As noted earlier,
Columns 1 and 2 of Table 1.8 taken together constitute a grouped frequency distribution. In
the latter distribution eight class intervals are employed, with each of the intervals containing 10
scores. The eight class intervals are listed in Column 1 of the table. The determination with
respect to the optimal number o f class intervals to employ should be based on the principle that
too few class intervals may obscure the underlying structure of the data, while too many class
intervals may defeat the purpose of grouping the data by failing to summarize it succinctly.
32 Handbook o f Parametric and Nonparametric Statistical Procedures

Table 1.8 Summary Table for Distribution A

Column 1 Column 2 Column 3 Column 4 Column 5


Frequency Cumulative Proportion Cumulative
X i f) frequency proportion
90-99 1 20 1/20 = .05 1
80-89 1 19 1/20 = .05 .95
70-79 8 18 8/20 = .40 .90
60-69 8 10 8/20 = .40 .50
50-59 1 2 1/20 = .05 .10
40-49 0 1 0/20 = .00 .05
30-39 0 1 0/20 = .00 .05
20-29 1 1 1/20 = .05 .05

n = 20 Sum = 1

Although there is no consensual rule of thumb for determining the optimal number of class
intervals, some sources recommend that the number of intervals be approximately equal to the
square root of the total number of scores in a distribution (i.e., fn ). The use of eight class
intervals in Table 1.8 (which, since n = 20, is more than \/20 = 4.47, which suggests the use of
four or five class intervals) illustrates that a researcher has considerable latitude in determining
how many class intervals best communicate the structure of a set o f data. Two final points to be
made with respect to a grouped frequency distribution are: a) As is the case in Table 1.8, each of
the class intervals should contain an equal number of scores; b) When a set of data is summarized
within the format of a grouped frequency distribution and all the original n scores which
comprise the distribution are not available, the mean of the distribution is computed as follows:
1) Multiply the midpoint of each class interval by the frequency for that class interval. The
midpoint of a class interval is computed by dividing the sum of the lowest and upper values
which define the class interval by 2. Thus, the respective midpoints of the eight class intervals
in Table 1.8 are 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5. When the latter values are
multiplied by their respective frequencies, the following values are obtained: (24.5)(1) = 24.5;
(34.5)(0) = 0; (44.5)(0) = 0; (54.5)(1) = 54.5; (64.5)(8) = 516; (74.5)(8) = 596; (84.5)(1) = 84.5;
(94.5)(1) = 94.5. The latter products sum to 1370. When 1370 is divided by n = 20, the value 68.5
is computed for the mean of the distribution. Note that although the value 68.5 is close to, it is
not identical to 66.8, the actual value of the mean computed earlier when the exact values of all
20 scores in the distribution are taken into account. There will generally be a slight discrepancy
between a mean computed from a grouped frequency distribution when compared with the actual
value of the mean, by virtue of the fact that the computations for the grouped frequency
distribution employ the midpoints of class intervals in lieu of the exact scores. In point of fact,
when data are only available within the format of a grouped frequency distribution, it will not be
possible to determine the exact value of any statistic/parameter of a distribution — in other
words, one will only be able to approximate values for the mean, median, mode, variance,
skewness, and kurtosis. This is the case, since in order to compute the exact value of a
statistic/parameter, it is necessary to know the value of each of the n scores which comprise the
distribution.
Earlier in this chapter it was noted that a frequency polygon is a graph of a frequency
distribution, and that Figure 1.1 represents a frequency polygon of Distribution A. A frequency
polygon can also be drawn for a grouped frequency distribution. In the latter case, a point is
Introduction 33

Figure 1.4 Grouped Frequency Polygon of Distribution A

employed to represent the frequency of each class interval. The point representing a class interval
is placed directly above the value of the midpoint of that interval (with the latter value being
recorded on the X-axis). Figure 1.4 represents a frequency polygon for the grouped frequency
distribution of Distribution A. Note that the values 14.5 and 104.5, which are, respectively, the
midpoints of the class intervals 10-19 and 100-109 (which are the class intervals adjacent to the
lowest and highest class intervals recorded in Table 1.8), are recorded on the .Y-axis of Figure 1.4.
The values 14.5 and 104.5 are included in the grouped frequency polygon in order that both ends
of the polygon touch the X-axis.
At the beginning of this discussion it was noted that the variable in Distribution A is
assumed to be a discrete variable which can only assume integer values. However, it is often the
case that a researcher is required to construct a frequency distribution and/or polygon for a
continuous variable. (Earlier in this chapter it was noted that a continuous variable can
assume any value within the range of scores that define the limits of that variable.) In the latter
instance a distinction is made between the apparent (or stated) limits of a score (or interval)
versus the real (or actual) limits of a score (or interval). The apparent (or stated) limits of a
score or interval are the values employed in the first column of a frequency distribution or
grouped frequency distribution. Thus in the case of Table 1.8, the apparent limits for each of
the intervals are the values listed in Column 1. If instead of employing a grouped frequency
distribution, a simple ungrouped frequency distribution had been employed (with each of the
scores between 22 and 96 listed in Column 1 of the table), the latter values would represent the
apparent limits. In the case of a continuous variable it is assumed that the real limits of an
integer score extend one-half a unit below the lower apparent limit and one-half a unit above
the upper apparent limit. Thus, in the case of the score 22, the lower real limit is 22 - .5 = 21.5,
and the upper real limit 22 + .5 = 22.5. In the case of the class interval 20 ­ 29, the lower real limit
o f the class interval is 20 - .5 = 19.5 (i.e., .5 is subtracted from the lowest value in that class
interval), and the upper real limit of the class interval is 29 + .5 = 29.5 (i.e., .5 is added to the
highest value in that class interval). The width of the latter class interval is assumed to be the
difference between the upper real limit and the lower real limit — i.e., 29.5 - 19.5 = 10 (which
corresponds to the number of integer values between the apparent limits of that class interval).
34 Handbook o f Parametric and Nonparametric Statistical Procedures

Two additional points to take note of concerning real limits are the following: a) In the
event a score is equivalent to a value which corresponds to both the lower real limit and upper
real limit of adjacent class intervals (e.g., such as the value 29.5 in Table 1.8, which is the upper
real limit of the class interval 20-29 and the lower real limit of the class interval 30-39), one can
employ a rule such as the following to assign that score to one of the two class intervals: The
value is placed in the higher of the two class intervals if the first digit of the lower real limit of
that class interval is even, and the value is placed in the lower of the two class intervals if the first
digit of the lower real limit of that class interval is odd. Thus, the score 29.5 would be assigned
to the class interval 30-40 (the real limits of which are 29.5-39.5), since the first digit of the
lower real limit of that interval is 2 (which is an even number). Note that the real limits of the
other class interval under consideration (20-29) are 19.5-29.5, and that the first digit of the lower
real limit is 1 (which is an odd number); b) If the apparent limits of scores are expressed in
decimal values, the one - half unit rule noted above for determining real limits is applied in
reference to the relevant decimal unit of measurement. Specifically, if the apparent limits of a
class interval are 20.0 - 20.5, the real limits would be 20.00 - .05 = 19.95 and 20.50 + .05 = 20.55;
if the apparent limits o f a class interval are 20.05-20.55, the real limits would be 20.050 -.005
= 20.045 and 20.550 + .005 = 20.555, etc. The practical application of the real limits is illustrated
later in this section within the protocol for drawing a histogram of a grouped frequency
distribution for a continuous variable (Figure 1.8).
The relative frequency distribution Columns 1 and 4 of Table 1.8 taken together
constitute a relative frequency distribution (more specifically, it can be referred to as a relative
grouped frequency distribution, since it is based on data grouped in class intervals). Both a
relative frequency distribution (which is a table) and a relative frequency polygon (which is
a graph of a relative frequency distribution) employ proportions or percentages instead of
frequencies. In a relative frequency distribution, the proportion or percentage of cases for each
score (or in the case of a grouped frequency distribution, the proportion or percentage of cases
in each class interval) is recorded in the column adjacent to the score (or class interval). In a
relative frequency polygon (which Figure 1.5 represents), the T-axis is employed to represent
the proportion or percentage of subjects who obtain a given score, instead of the actual number
of subjects who obtain that score (which is the case with a frequency polygon). The use of
relative frequencies (which proportions or percentages represent) allows a researcher to
compare two or more distributions with unequal sample sizes — something the researcher could
not easily do if frequencies were represented on the T-axis.
Note that in Figure 1.5 the relative frequencies are recorded for all scores within the range
20 - 100. Any score that occurs three times (i.e., the score of 72) has a relative frequency of 3/20
= .15. Any score that occurs two times (i.e., the scores of 61, 62, 63, 71, and 74) has a relative
frequency of 2/20 = .10. Any score that occurs one time (i.e., the scores of 22, 55, 60, 67, 76,
82, and 96) has a relative frequency of 1/20 = .05. Any score in the range of scores 20 ­ 100 that
does not occur has a relative frequency of 0/20 = 0. Although not demonstrated in this section,
it is possible to construct a graph of a relative grouped frequency distribution - i.e., a relative
grouped frequency polygon. In such a case, the relative frequency for each class interval is
recorded above the midpoint of that class interval at the appropriate height on the T-axis.
The cumulative frequency distribution Columns 1 and 3 of Table 1.8 taken together
constitute a cumulative frequency distribution (more specifically, it can be referred to as a
cumulative grouped frequency distribution, since it is based on data grouped in class
intervals). In a cumulative frequency distribution the frequency recorded for each score
represents the frequency of that score plus the frequencies of all scores which are less than that
Introduction

. 15-

Relative frequency (proportion)

.10-

0 20 30 40 50 60 70 80 90 100

Figure 1.5 Relative Frequency Polygon of Distribution A

Figure 1.6 Cumulative Grouped Frequency Polygon of Distribution A


36 Handbook o f Parametric and Nonparametric Statistical Procedures

score. In the case of a grouped frequency distribution, the frequency recorded for each class
interval represents the frequency of that class interval plus the frequencies for all class intervals
which fall below that class interval. Scores are arranged ordinally, with the lowest score/class
interval at the bottom of the distribution, and the highest score/class interval at the top of the
distribution. The cumulative frequency for the lowest score/class interval will simply be the
frequency for that score/class interval, since there are no scores/class intervals below it. On the
other hand, the cumulative frequency for the highest score/class interval will always equal n, the
total number o f scores in the distribution.
Figure 1.6 is a cumulative frequency polygon of the grouped frequency distribution
summarized by Columns 1 and 3 of Table 1.8 (more specifically, it can be referred to as a
cumulative grouped frequency polygon, since it is based on data grouped in class intervals).
Note that in Figure 1.6 the cumulative frequency for each class interval is recorded above the
midpoint of the class interval. The S shaped curve which describes the shape of a cumulative
frequency polygon is commonly referred to as an ogive.
The relative cumulative frequency distribution Columns 1 and 5 of Table 1.8 taken
together constitute a relative cumulative frequency distribution (more specifically, it can be
referred to as a relative cumulative grouped frequency distribution, since it is based on data
grouped in class intervals). In a relative cumulative frequency distribution, cumulative
proportions or percentages are employed for each score (or class interval) in lieu of cumulative
frequencies. In the case of a relative cumulative grouped frequency distribution, the cumulative
proportion or percentage is recorded for each class interval. The cumulative proportion (or
cumulative percentage) for the lowest score/class interval will simply be the cumulative
proportion (cumulative percentage) for that score/class interval, since there are no scores/class
intervals below it. On the other hand, the cumulative proportion (cumulative percentage) for the
highest score/class interval will always equal 1 (or 100%).

Additional graphing techniques This section will describe the following alternative graphing
techniques: a) Bar graph; b) Histogram.
Bar graph Bar graphs are employed when nominal or ordinal data/information are
recorded on the X-axis, or when interval or ratio data representing a discrete variable are
recorded on the X-axis. If we assume that the variable being measured in Distribution A is a
discrete variable (to be more specific, if we assume that in Distribution A scores can only be
integer values), we can employ a bar graph such as Figure 1.7 to summarize the frequency
distribution of the data. In point of fact, Figure 1.7 provides the same information presented in
Figure 1.1. Note that in Figure 1.7 the top of each of the vertical bars/lines corresponds to each
of the points which are connected with one another in the frequency polygon depicted in Figure
1.1. (As is the case with the frequency polygon, any score which has a frequency of zero falls on
the X-axis, and thus no bar is employed for such scores in Figure 1.7.) Note that in a bar graph,
since each of the vertical bars is assumed to represent a discrete entity (or value), the bars are not
contiguous (i.e., do not touch), although all of the bars are parallel to one another.10
Although not demonstrated in this section, it is possible to draw a bar graph for the grouped
frequency distribution. In such a case, separate bars are drawn for each class interval noted in
Table 1.8. The width of each bar will correspond to the width of the apparent limit listed in
Column 1 for the class interval. As is the case in Figure 1.7, if a discrete variable is involved the
bars will not be contiguous with one another.
Histogram A histogram is a bar graph which is employed to reflect frequencies for
continuous data. In a histogram, since each of the vertical bars represents elements which
comprise a continuous variable, the vertical bars are contiguous with one another (i.e., touch one
Introduction 37

0 20 30 40 50 60 70 80 90 100

Figure 1.7 Bar Graph of Frequency Distribution of Distribution A

Figure 1.8 Histogram of Grouped Frequency Distribution of Distribution A


38 Handbook o f Parametric and Nonparametric Statistical Procedures

another). If for the moment we assume that a continuous variable is employed in Distribution
A, Figure 1.8 can be employed as a histogram (also referred to as a frequency histogram) of the
grouped frequency distribution summarized in Columns 1 and 2 of Table 1.8. Note that in Figure
1.8, the width of each of the bars corresponds to the real limits of the relevant class interval, and
the midpoint o f each bar is directly above the midpoint of the relevant class interval. As noted
above, the bars for adjacent class intervals are contiguous with one another.

Exploratory data analysis As noted earlier, exploratory data analysis (EDA) is a general
approach for visually examining data that was introduced to the statistical community in 1977
by the statistician John Tukey. Smith and Prentice (1993) note that Tukey (1977) coined the term
exploratory data analysis to simultaneously represent a philosophy with respect to examining
data, as well as a set o f tools which were intended to be used for that purpose. The intent of this
section is to familiarize the reader with the basic principles underlying EDA, as well as to
describe the stem-and-leaf display and boxplot, two of the procedures introduced by Tukey
(1977) for visually examining data. Tukey (1977) believed that the latter two procedures were
more effective than conventional methods of visual display with respect to their ability to reveal
relevant information about a set of data.
The basic philosophy underlying EDA is that prior to initiating an analysis a researcher
should scrutinize every piece of datum. Only after doing the latter can the researcher make an
intelligent decision with respect to what the most prudent strategy will be for analyzing the data.
Smith and Prentice (1993, p. 353) note that EDA endorses the use of measures which are
relatively insensitive to data contamination (e.g., the presence of outliers), and which do not have
associated with them strong assumptions regarding the shape o f the underlying population
distribution — such measures are often referred to as resistant indicators. Thus, values such as
the median (which is commonly employed within the framework of nonparametric statistical
analysis) and the interquartile range are commonly employed in EDA, respectively, as metrics
of central tendency and variability (in lieu of the mean and standard deviation). Although the
philosophy underlying EDA is more concordant with that upon which nonparametric procedures
and robust statistical procedures are based, Smith and Prentice (1993, p. 356) note that both
nonparametric and robust statistical procedures are primarily employed within the framework of
statistical inference, whereas EDA is employed in order to explore and describe data. A robust
statistical test/procedure is one which is not overly dependent on critical assumptions regarding
an underlying population distribution. Because of the latter, under certain conditions (e.g., when
an underlying population distribution deviates substantially from normality) a robust statistical
procedure can provide a researcher with more reliable information than an inferential statistical
procedure which is not robust (i.e., a nonrobust inferential statistical procedure may yield
unreliable results when one or more critical assumptions about the underlying population
distribution are violated). Further clarification of robust statistical procedures can be found in
Section VII o f the t test for two independent samples under the discussion of outliers, as well
as in Section IX (the Addendum) of the Mann-Whitney U test. Recommended sources which
discuss resistant indicators are Grissom and Kim (2005, pp. 16 -19) and Wilcox (1987, 1996,
1997, 2001,2003).
Stem-and-leaf display A stem-and-leaf display summarizes the information contained
in a distribution (or batch, which Tukey (1977) routinely used as a synonym for a distribution). *
To be more specific, a stem - and - leaf display simultaneously summarizes the information con­
tained in a frequency polygon, a frequency histogram, and a cumulative frequency distribution,
and in doing so it displays all of the original data values. Tukey (1977) believed that the stem-
and - leaf display retained all of the original scores in a format that was easier to interpret than the
Introduction 39

5 5

6 01122337

7 11222446

8 2
9 6

Table 1.9 Stem-and-Leaf Display of Distribution A

format of more conventional tables and graphs. Among other things, a stem-and-leaf display
allows a researcher to easily determine the mode and median of a distribution.
Table 1.9 represents a stem-and-leaf display of Distribution A. Note that in Table 1.9 a
vertical line (sometimes referred to as the vertical axis) separates two sets of numbers. The
numbers to the left of the vertical line represent the stems in a stem-and-leaf display, while the
numbers to the right of the vertical line represent the leaves. In the case of a two digit integer
number, the stems represent the first digit of each score (the first digit o f a two digit integer
number is often referred to as the tens’, leading, most significant, or base digit). The second
digits of a two digit integer number in a stem-and-leaf display will represent a leaf (the second
digit of a two digit integer number is often referred to as the units’, trailing, or less significant
digit). Note that in Table 1.9, for a given stem value the leaves are the second digit o f all scores
which have that stem value. Specifically, the score that falls between 20 and 29 (22) has the stem
2 (which is the first digit). The single leaf value (2) recorded to the right of the stem value 2
corresponds to the second digit o f the number that falls between 20 and 29. The score that falls
between 50 and 59 (55) has the stem 5 (which is the first digit). The single leaf value (5) recorded
to the right of the stem value 5 corresponds to the second digit of the number that falls between
50 and 59. All of the scores that fall between 60 and 69 (one 60, two 61s, two 62s, two 63s, and
one 67) have the stem 6 (which is the first digit). The eight leaf values (0, 1, 1 ,2 ,2 ,3 ,3 , and 7)
recorded to the right of the stem value of 6 correspond to the second digit of the numbers that fall
between 60 and 69. All o f the scores that fall between 70 and 79 (two 71s, three 72s, two 74s,
and one 76) have the stem 7 (which is the first digit). The eight leaf values (1 ,1 ,2 ,2 ,2 ,4 ,4 , and
6) recorded to the right of the stem value of 7 correspond to the second digit of the numbers that
fall between 70 and 79. The score that falls between 80 and 89 (82) has the stem 8 (which is the
first digit). The single leaf value (2) recorded to the right of the stem value 8 corresponds to the
second digit o f the number that falls between 80 and 89. The score that falls between 90 and 99
(96) has the stem 9 (which is the first digit). The single leaf value (6) recorded to the right of the
stem value 9 corresponds to the second digit of the number that falls between 90 and 99. Note
that the stem values 3 (which represents any scores that fall between 30 and 39) and 4 (which
represents any scores that fall between 40 and 49) do not have any leaf values, since no scores
fall in the aforementioned intervals.
40 Handbook o f Parametric and Nonparametric Statistical Procedures

Although it is not necessary in the case of Table 1.9, it may be desirable to modify a stem-
and - leaf display in order to more accurately communicate the underlying structure of a set of
data. Specifically, stem values can be employed within the same context of a class interval. As
an example, assume we have a bell shaped/normal distribution which consists o f 50 scores that
fall between the values 50-59. In such a case it would not be very useful to construct a stem-and-
leaf display comprised of a single row/line which consisted of a single stem value (specifically,
the value 5) followed by the 50 leaf values. Instead, a researcher could effectively
communicate the underlying structure of the data by employing five rows with a stem value of
5. Each row for a given stem value would only be employed for two leaf values. Thus, any leaf
value of 0 or 1 would be recorded in the first row for the stem value 5 (i.e., scores of 50 and 51),
leaf values of 2 or 3 would be recorded in the second row for the stem value 5 (i.e., scores of 52
and 53), etc. The latter is illustrated below in Table 1.10 for 50 hypothetical scores which fall
within the range 50-59. (The distribution in Table 1.10 will be designated as Distribution J.)
The number o f leaf values that may fall in a given row of a stem-and-leaf display is referred to
as the line width of the row. Tukey (1977) used typographic symbols or letters such as those
noted in Table 1.10 following a given stem value to differentiate the leaf intervals within that stem
value (i.e., for a given stem value, an asterisk (*) is employed to designate the interval for the leaf
values 0 and 1, the letter t is employed to designate the interval for the leaf values 2 and 3, etc.).
In contrast to the class intervals in a frequency distribution, the size of a class interval (i.e., the
line width) in a stem-and-leaf display can only be 2, 5, or some value that is a power of 10 (i.e.,
101= 10, 102 = 100, etc.). The reason for the latter is to insure that each class interval has the
same line width (i.e., contains the same number of scores).

0000011111 5* 00111

2222233333 5t 222233333

4444455555 5f 444444444444555555555555

6666677777 5s 6667777

8888899999 5* 89999

Table 1.10 Stem-and-Leaf Display Involving Class Intervals for Distribution J

5* 00111

5t 222233333

5f 444444444444555555555555

5s 6667777

5* 89999

Table 1.11 Stem-and-Leaf Display Contrasting Distributions J and K


Introduction 41

A stem-and-leaf display can be an effective tool for contrasting two different distributions
with one another. The latter is illustrated in Table 1.11. Note that the stem values are written
along the vertical axis. Two vertical lines are drawn, one to the right of the stem values and one
to the left o f the stem values. The leaf values for Distribution J are recorded to the right of the
vertical line which is situated to the right of the stem values. The leaf values for Distribution K
are recorded to the left of the vertical line which is situated to the left of the stem values. Such
an arrangement facilitates a researcher’s ability to visually examine whether or not there appear
to be major differences between the two distributions. In our example, Table 1.11 reveals that
Distribution J appears to be normal, while Distribution K appears to be uniform (i.e., the scores
are evenly distributed throughout the range of scores/class intervals in the distribution).
It should be noted that it will not always be the case that the stem values in a stem-and-leaf
display will be represented by the tens’ value of a score. For instance, if the scores which
constitute a set of data fell within the range 100-999, the hundreds’ digit values would represent
the stems of a stem-and-leaf display, while the leaves would be represented by the tens’ digit
values. In such a case, the units’ digit values would be omitted from the table. To illustrate, if the
scores in such a distribution which fell between 100 and 199 were 150, 167, 184, and 193, the
stem value used to represent the latter four scores would be 1 and the four leaf values would be
5,6, 8, and 9 (which, respectively, correspond to the first digit or tens’ value for 50,60,80, and
90). The unit values of 0, 7,4, and 3 (i.e., that last digit, respectively, of the four scores) would
be omitted from the table. As a further example of constructing a stem-and-leaf display for data
which involve values other than two digit integer numbers, assume a distribution of data consists
of scores that fall within the range .1 - 99. In such a case, the stem values would be the first digit
and the leaf values would be the second digit. Thus, if the scores.112,. 123,. 156,. 178 occurred,
the stem for the latter values would be 1 and the respective leaf values would be 1,2, 5, and 7.
Once again, the last digit for each score (i.e., the values 2 ,3 ,6 , and 8) would be omitted from the
table. O f course, if a researcher deemed it appropriate, in both of the examples cited in this
paragraph the use of class intervals could be employed. In other words, as was done in Tables
1.10/1.11, one could employ more than one line for a specific stem value. As an example, in the
case where scores fall within the range .1 - 99, two rows could be employed for a given stem
value, with one row employed for the leaf values 0 ,1 ,2 ,3 ,4 , and the second row employed for
the leaf values 5, 6, 7, 8, 9.
Boxplot It was noted earlier that proponents of EDA endorse the use of descriptive
measures which are relatively insensitive to data contamination, such as the median (in lieu of
the mean) and the interquartile range (in lieu of the standard deviation). A boxplot (also
referred to as a box-and-whisker plot) represents a method of visual display developed by
Tukey (1977) which, among other things, provides a succinct summary of a distribution while
displaying the median and elements called hinges (which for all practical purposes correspond
to the 25thand 75thpercentiles, which are the boundaries that define the interquartile range).
A number of elements displayed in a boxplot such as the median and hingespread (which is
defined later in this section) are commonly described as resistant indicators, since their values
are independent of the values of any outliers present in the data. The latter would not be the case
if the mean and variance were, respectively, employed as measures of central tendency and
variability. The value of the mean can be greatly influenced by the presence of one or more
outliers, and the presence of outliers can dramatically increase the value of the variance. Because
of the latter, the mean and variance are not considered to be resistant indicators. Figure 1.10 is
a boxplot o f Distribution A.
The following values and/or elements documented in the boxplot depicted in Figure 1.10
will now be computed and/or explained: a) Median: b) Hinges; c) Hingespread; d) Fences,
outliers, and severe outliers; e) Adjacent values; f) Whiskers. The latter values or elements
42 Handbook o f Parametric and Nonparametric Statistical Procedures

can be determined through use of a stem-and-leaf display, or alternatively by ordinally arranging


all of the scores in the distribution as is done in Figure 1.9.

Quartile First Second Third Fourth


22 55 60 61 61 62 62 63 63 67 71 71 72 72 72 74 74 76 82 96
I 1 1

Lower hinge = 61.5 Median = 69 Upper hinge = 73

Figure 1.9 Determination of Hinges for Boxplot of Distribution A

a) The first value we need to compute in order to construct a boxplot for Distribution A
is the median. Earlier in this chapter it was noted that the median is the middle score in the
distribution (which corresponds the 50thpercentile), and that when the total number o f scores is
an even number (which is the case with Distribution A, since n = 20), there will be two middle
values. In such a case, the median is the average of the two middle scores. Employing the above
rule with Distribution A, the two middle scores will be those in the 10th and 11th ordinal
positions, which are 67 and 71. The average of the latter two scores is (67 + 71)/2 = 69, which
is the value of the median.
At this point a general protocol will be presented for computing the ordinal position of a
score which corresponds to any percentile value, and the score in the distribution which
corresponds to that percentile value. The equation k = np is employed, where n represents the
total number of scores in the distribution,/? represents the percentile expressed as a proportion,
and k represents an ordinal position value.
1) If the value computed for k with the equation k = np is an integer number, the ordinal
position of the score with the desired percentile is the average of the scores in the Ph and (k+ l)th
ordinal positions. Employing this protocol to determine the median/50th percentile of
Distribution A, we obtain k = (20)(.5) = 10. Since k = 10 is an integer value, the median will
be the average of the scores in the k = 10th and the (k + 1) = 11th ordinal positions. This is
consistent with the result obtained earlier when the values 67 (which is in the 10th ordinal
position) and 71 (which is in the 11thordinal position) were averaged to compute the value 69 for
the median of the distribution.
Employing the above protocol to determine the 90thpercentile o f Distribution A, we obtain
k = (20)(.9) = 18. Since k= 18 is an integer value, the 90thpercentile will be the average of the
scores in the k - 18th and the (k + 1) = 19thordinal positions, which are, respectively, 76 and 82.
Thus, the value of the score at the 90thpercentile in Distribution A is (76 + 82)/2 = 79.
2) If the value computed for k with the equation k = np is not an integer number, the
ordinal position of the score with the desired percentile is one unit above the integer portion
of the value computed for k. Thus, if we wanted to determine the 47thpercentile o f Distribution
A, we determine that k = (20)(.47) = 9.4. Since the integer portion o f k - 9, the score in the k +
1 = 10th ordinal position represents the score at the 47th percentile. In the case of Distribution
A the score in the 10th ordinal position is 67.
It should be noted that the designation of 67 as the score at the 47thpercentile is predicated
on the assumption that the variable being measured is a discrete variable which can only assume
integer values. If, on the other hand, the variable in question is assumed to be a continuous
variable, interpolation can be employed to compute a slightly different value for the score at the
47th percentile. (The term interpolate means to compute an intermediate value which falls
between two specified values.) Specifically, we can determine that the score at the 40thpercentile
Introduction 43

no
Upper outer fence = 107.5

100

★ Outlier = 96

90 Upper inner fence = 90.25

Upper adjacent value = 82


80

Upper hinge (3rd quartile) = 73


70
Median = 69
Hingespread = 73 - 61.5 = 11.5

Lower hinge (1st quartile) = 61.5­


60

Lower adjacent value = 55

5 0 ----

Lower inner fence = 44.25

40

30 - -

Lower outer fence = 27

★ Severe outlier = 22
20

Figure 1.10 Boxplot of Distribution A


44 Handbook o f Parametric and Nonparametric Statistical Procedures

will be 63, since it is the average of the scores in the 8thand 9thordinal positions (i.e., since k =
(20)(.40) = 8, we compute (63 + 62)12 = 63). We already know that 69 represents the score at the
50thpercentile. Since from the 40thpercentile, the 47thpercentile is seven tenths of the way up to
the 50th percentile, we multiply .7 by 6 (which is the difference between the values 69 and 63
which respectively represent the 50thand 40th percentiles) and obtain 4.2. When the latter value
is added to 63 (the value of the 40th percentile) it yields 63 + 4.2 = 67.2. The latter value is
employed to represent the score at the 47thpercentile.11
b) The hinges in Figure 1.10 represent the median of the scores above the 50thpercentile,
and the median of the scores below the 50th percentile (i.e., the points halfway between the
median of the distribution and the most extreme score at each end of the distribution). When the
sample size is reasonably large and there are relatively few tied scores, the hinge values will
correspond closely with the 25thpercentile and the 75thpercentile of a distribution. Because of
the latter, many sources define the hinges as the scores at the 25thand 75th percentiles. Figure 1.9
illustrates the determination of the hinges, employing the definition of a hinge as the median of
the scores above and below the 50thpercentile of the distribution. The value 61.5 computed for
the lower hinge is in between the values 61 and 62, which separate the first and second quartiles
of the distribution. 61.5 is the average of 61 and 62 (i.e., (61 + 62)/2 = 61.5). The value 73
computed for the upper hinge is in between the values 72 and 74, which separate the third and
fourth quartiles o f the distribution. 73 is the average of 72 and 74 (i.e., (72 + 74)/2 = 73). Note
that if the equation k = np is employed to compute the scores at the 25th and 75thpercentiles, the
following values are obtained: a) Since (20)(.25) = 5 yields an integer number, the score at the
25thpercentile will be the average of the scores in the 5th and 6th (since (k = 5) + 1 = 6) ordinal
positions — i.e., (61 + 62)12 = 61.5; b) Since (20)(.75) = 15 yields an integer number, the score
at the 75thpercentile will be the average of the scores in the 15th and 16th (since (k = 15) + 1 = 16)
ordinal positions - i.e., (72 + 74)/2 = 73.
c) The hingespread (also referred to as H-spread and fourth-spread) is the difference
between the upper hinge and the lower hinge. Thus, Hingespread = 73 —61.5 = 11.5. Within
the framework of a boxplot, the hingespread is employed as the measure of dispersion (i.e.,
variability).
d) Tukey (1977) stipulated that any value which falls more than one and one -half
hingespreads outside a hinge should be classified as an outlier. The term inner fence is
employed to designate the point in the distribution that falls one and one - half hingespreads
outside a hinge (i.e., above an upper hinge or below a lower hinge).12 Since the value of the
hingespread is 11.5, (1.5)(11.5) = 17.25. The upper inner fence will be the value computed
when 17.25 is added to the value of the upper hinge. Thus, 73 + 17.25 = 90.25. The lower inner
fence will be the value computed when 17.25 is subtracted from the value of the lower hinge.
Thus, 61.5 - 17.25 = 44.25. Any value which is more than three hingespreads outside a hinge
is classified by some sources as a severe outlier. The term outer fence is employed to designate
the point in the distribution that falls three hingespreads outside a hinge (i.e., above an upper
hinge or below a lower hinge). The upper outer fence will be the value computed when
(3)(11.5) = 34.5 is added to the value of the upper hinge. Thus, 73 + 34.5 = 107,5. The lower
outer fence will be the value computed when (3)(11.5) = 34.5 is subtracted from the value of the
lower hinge. Thus, 61.5 - 34.5 = 27. Note that only two scores can be classified in the outlier or
severe outlier category. The score 96 is classified as an outlier since it falls beyond one and one-
half hingespreads from the upper hinge. Although the score 96 falls above the upper inner
fence o f 90.25, it does not qualify as a severe outlier, since it falls below the upper outer fence
of 107.5. The score 22 is classified as a severe outlier since it falls below the lower outer fence
of 27.13
Introduction 45

e) Those values in the distribution that are closest to the inner fences, but which fall inside
the inner fences, are referred to as adjacent values (also referred to as extreme values). In other
words, the two adjacent values are the value closest to being one and one - half hingespreads
above the upper hinge, but still below the upper inner fence, and the value closest to being one
and one - half hingespreads below the lower hinge, but still above the lower inner fence. Put
more simply, the adjacent values are the most extreme values in the distribution in both directions
which do not qualify as being outliers.14 In the case of Distribution A, the score which is closest
to but not above the upper inner fence is 82. Thus, the latter score is designated as the upper
adjacent value. The score which is closest to but not below the lower inner fence is 55. Thus,
the latter score is designated as the lower adjacent value.
f) The term whisker is employed to refer to each of the vertical lines which extend from the
center of each end of the box in the boxplot (i.e., from the upper and lower hinges) to each of the
adjacent values. Thus, the whisker at the top of the boxplot extends from the upper hinge which
designates the score 73 to the upper adjacent value which designates the score 82. The whisker
at the bottom of the boxplot extends from the lower hinge which designates the score 61.5 to the
lower adjacent value which designates the score 55.
Inspection o f the boxplot in Figure 1.10 reveals the following information regarding
Distribution A: a) The fact that the median is not in the center of the box element (i.e. the area
between the upper and lower hinge) of the boxplot indicates that the distribution is not
symmetric; b) The fact that the upper whisker of the boxplot is somewhat longer than the lower
whisker indicates that Distribution A is negatively skewed — i.e., there are a disproportionate
number of high scores in the distribution. The latter is further supported by the fact that the value
of the mean 66.8 is less than the median of 69, which is less than the mode, which equals 72.
As in the case of a stem-and-leaf display, by arranging the boxplots of two distributions
parallel to one another, one can compare the two distributions. It should be noted that although
descriptions of boxplots (as well as the computation of relevant values) may not be consistent
across all sources, boxplots derived through use of protocols stated in various sources will
generally be quite similar to one another. In the final analysis, slight differences between
boxplots based on different sources will be unimportant, insofar as a boxplot is employed as an
exploratory device for developing a general picture regarding the structure of a set of data. A
more detailed discussion of boxplots and the general subject of exploratory data analysis can
be found in Mosteller and Tukey (1977), Tukey (1977), Hoaglin, Mosteller, and Tukey (1983),
and Smith and Prentice (1993). Cleveland (1985) is also an excellent source on the use of graphs
in summarizing data.

The Normal Distribution

When an inferential statistical test is employed with one or more samples to draw inferences
about one or more populations, such a test may make certain assumptions about the shape of an
underlying population distribution. The most commonly encountered assumption in this regard
is that a distribution is normal. The normal distribution is also referred to as the Gaussian
distribution, since the German mathematician Karl Friedrich Gauss discussed it in 1809. Zar
(1999, p. 65), however, notes that the normal distribution was actually first identified by the
French mathematician Abraham de Moivre in 1733 and mentioned as well in 1774 by another
French mathematician, Pierre Simon, Marquis de Laplace.
When viewed from a visual perspective, the normal distribution (which as noted earlier
is often referred to as the bell-shaped curve) is a graph of a frequency distribution which can
be described mathematically and observed empirically (insofar as many variables in the real
world appear to be distributed normally). The shape of the normal distribution is such that the
46 Handbook o f Parametric and Nonparametric Statistical Procedures

closer a score is to the mean, the more frequently it occurs. As scores deviate more and more
from the mean (i.e., become higher or lower), the more extreme the score the lower the frequency
with which that score occurs. As noted earlier, a normal distribution will always be symmetrical
(with Yj = gj = 0and^j3[ = = 0) and mesokurtic (with y2 = g2 = 0 and p2 = b2 = 3 ) .
Any normal distribution can be converted into what is referred to as the standard normal
distribution by assigning it a mean value of 0 (i.e., p = 0) and a standard deviation of 1 (i.e., o
= 1). The standard normal distribution, which is represented in Figure 1.11, is employed more
frequently in inferential statistics than any other theoretical probability distribution. The use of
the term theoretical probability distribution in this context is based on the fact it is known that
in the standard normal distribution (or, for that matter, any normal distribution) a certain propor­
tion of cases will always fall within specific areas of the curve. As a result of this, if one knows
how far removed a score is from the mean of the distribution, one can specify the proportion of
cases which obtain that score, as well as the likelihood of randomly selecting a subject or object
with that score. Figure 1.11 will be discussed in greater detail later in this section.
In a graph such as Figure 1.11, the range of values a variable may assume is recorded on the
A-axis. In the case of Figure 1.11, the scores are represented in the form of z scores/standard
deviation scores (which are explained in this section). The values on the T-axis can be viewed
as representing the frequency of occurrence for each of the scores in the population (thus the
notation / ). As noted earlier, sometimes proportions, probabilities, or density values are
recorded on the T-axis. Further clarification of density values is provided below.
Equation 1.36 is the general equation for the normal distribution.

Y = — —
e <x ■ (Equation 1.36)
G]/2n

In point of fact, Equation 1.36 and Figure 1.11 respectively represent the equation and graph
for the probability density function of the normal distribution. In mathematics the term
function is commonly employed to summarize the relationship between two variables such as
X and T. Such an equation summarizes the operations which when performed on the X variable
will yield one or more specific values for the T variable. In the case of a theoretical distribution,
such as the normal distribution, a density function describes the relationship between a variable
and the densities (which for the moment we will assume are probabilities) associated with the
range of values the variable may assume. When a density function is represented in a graphic
format, the range of values the variable may assume is recorded on the abscissa (A-axis), and
the density values (which will range from 0 to 1) are recorded on the ordinate (T-axis). Through
use of Equation 1.36 and/or Figure 1.11 one can determine the proportion of cases/area in a
normal distribution that falls between any two points on the abscissa. Those familiar with
calculus will recognize that if we have two points a and b on the abscissa representing two values
of a continuous variable, and we integrate the equation of the density function over the interval
a to b, we will be able to derive the area/proportion of the curve which falls between points a and
b. As will be noted shortly, in lieu of having to employ calculus to compute the proportion that
falls between any two points, in the case of the normal distribution the appropriate values can be
determined through use of Table A1 (The Table of the Normal Distribution) in the Appendix.
It should be noted that although the densities which are recorded on the ordinate of the graph of
a density function are often depicted as being equivalent to probabilities, strictly speaking a
density is not the same thing as a probability. Within the context o f a graph such as Figure 1.11,
a density is best viewed as the height of the curve which corresponds to a specific value of the
variable which is recorded on the abscissa. The area between any two points on the abscissa
Introduction 47

however, is generally expressed in the form of a probability or a proportion. Further clarification


of a probability density function and the general concept of probability can be found in Section
IX (the Addendum) of the binomial sign test for a single sample under the discussion of
Bayesian analysis of a continuous variable.
In Equation 1.36 the symbols p and o represent the mean and standard deviation of a normal
distribution. For any normal distribution where the values o f p and a are known, a value of Y
(which represents the height of the distribution at a given point on the abscissa) can be computed
simply by substituting a specified value of X in Equation 1.36. Note that in the case of the
standard normal distribution, where p = 0 and o = 1, Equation 1.36 becomes Equation I.37.15

¥ =— e ^ 1'2 (Equation 1.37)


\jln

The reader should take note of the fact that the normal distribution is a family of
distributions which is comprised of all possible values of p and a that can be substituted in
Equation 1.36. Although the values of p and a for a normal distribution may vary, as noted
earlier, all normal distributions are mesokurtic.
For any variable that is normally distributed, regardless of the values of the population mean
and standard deviation, the distance o f a score from the mean in standard deviation units can be
computed with Equation 1.38. The z value computed with Equation 1.38 (which is often referred
to as a standard score or standard deviation score) is a measure in standard deviation units of
how far a score is from the mean.16In instances where the population standard deviation is not
known (which is often the case in actual research) the estimated population standard deviation s
computed with Equation 1.11 is employed in lieu of a in the denominator of Equation 1.38 (i.e.,
z =(X-v)/s).
z = —— ­ (Equation 1.38)
a

Where: X is a specific score


p is the value of the population mean
o is the value of the population standard deviation

When Equation 1.38 is employed, any score that is above the mean will yield a positive z
value, and any score that is below the mean will yield a negative z value. Any score that is equal
to the mean will yield a z value of zero.
At this point we shall return to Figure 1.11 which will be examined in greater detail.
Inspection of the latter figure reveals that in the standard normal distribution a fixed proportion/
percentage of cases will always fall between specific points on the curve. Specifically, .3413 of
the cases (which expressed as a percentage is 34.13%) will fall between the mean and one
standard deviation above the mean (i.e., between the values z = 0 andz = +1), as well as between
the mean and one standard deviation below the mean (i.e., between the values z = 0 and z = ­ 1 )
.1359 (or 13.59%) o f the cases will fall between one and two standard deviation units above the
mean (i.e., between the values z = +1 and z = +2), as well as between one and two standard
deviation units below the mean (i.e., between the values z = - 1 and z = ­ 2). (One standard
deviation unit is equal to the value of o .) .0215 (or 2.15%) of the cases will fall between two
and three standard deviation units above the mean (i.e., between the values z = +2 and z = +3),
as well as between two and three standard deviation units below the mean (i.e., between the
values z = - 2 and z = ­ 3).
48 Handbook o f Parametric and Nonparametric Statistical Procedures

-3 -2 -1 0 +1 +2 +3 z

Standard deviation scores (z scores)

Figure 1.11 The Standard Normal Distribution

Note that since the normal distribution is symmetrical, the proportion/percentage of cases
to the right of the mean will be equivalent to the proportion/percentage of cases to the left of the
mean. If all of the proportions/percentages which fall within three standard deviations of the
mean (i.e., between the points z = +3 and z= —3) are summed, the value .9974 (or 99.74%) is
obtained (i.e., 0215 + .1359 + .3413 + .3413 +.1359 + .0215 = .9974). The latter value indicates
that in a normal distribution population, .9974 is the proportion o f the population which will
obtain a score that falls within three standard deviations from the mean. One half of those cases
(i.e., .9974/2 = .4987 or 49.87%) will fall above the mean (i.e., between z = 0 and z = +3), and
one half will fall below the mean (i.e., between z = 0 and z = -3). (Obviously some scores will
fall exactly at the mean. Half of the latter scores are generally included in the 49.87% which fall
above the mean, and the other half in the 49.87% which fall below the mean.) Only 1 - .9974
=.0026 (or .26%) of the population will obtain a score which is more or less than three standard
deviations units from the mean. Specifically, .0026/2 = .0013 (or .13%) will obtain a score that
is greater than three standard deviation units above the mean (i.e., above the score z = +3), and
.0013 will obtain a score that is less than three standard deviation units below the mean (i.e.,
below the score z = ­ 3). As will be noted shortly, all of the aforementioned normal distribution
proportions/percentages can also be obtained from Table A l.
In order to illustrate how z scores are computed and interpreted, consider Example 1.1.

Example 1.1 Assume we have an IQ testfo r which it is known that the population mean is p =
100 and the population standard deviation is a = 15. Three people take the test and obtain the
following IQ scores: Person A: 135; Person B: 65; and Person C: 100. Compute a z score
(standard deviation score) fo r each person.

Through use of Equation 1.38, the z scorefor each person iscomputed below. The reader
should take note of the fact that a z score shouldalways be computedtoatleast two decimal
places.
D a 135 - 100
Person A: z = --------------- = 2.33
15

^ D 65 - 100 0
Person B: z = ------------- = ­2.33
15

D ~ 100 ­ 100 n
Person C: z = --------------- = 0
15
Introduction 49

Person A obtains an IQ score that is 2.33 standard deviation units above the mean, Person
B obtains an IQ score that is 2.33 standard deviation units below the mean, and Person C obtains
an IQ score at the mean. If we wanted to determine the likelihood (i.e., the probability) of select­
ing a person (as well as the proportion of people) who obtains a specific score in a normal
distribution, Table A1 can provide this information. Although Table A1 is comprised of four
columns, for the analysis to be discussed in this section we will only be interested in the first
three columns.
Column 1 in Table A1 lists z scores which range in value from 0 to an absolute value of
4. The use o f the term absolute value of 4 is based on the fact that since the normal distribution
is symmetrical, anything we say with respect to the probability or the proportion of cases
associated with a positive z score will also apply to the corresponding negative z score. Note that
positive z scores will always fall to the right of the mean (often referred to as the right tail of the
distribution), thus indicating that the score is above the mean. Negative z scores, on the other
hand, will always fall to the left of the mean (often referred to as the left tail of the distribution),
thus indicating that the score is below the mean.17
Column 2 in Table A1 lists the proportion of cases (which can also be interpreted as prob­
ability values) that falls between the mean of the distribution and the z score that appears in a
specific row.
Column 3 in Table A1 lists the proportion of cases that falls beyond the z score in that row.
More specifically, the proportion listed in Column 3 is evaluated in relation to the tail of the
distribution in which the score appears. Thus, if a z score is positive, the value in Column 3
will represent the proportion of cases that falls above that z score, whereas if the z score is
negative, the value in Column 3 will represent the proportion of cases that falls below that z
score.18
Table AI will now be employed in reference to the IQ scores of Person A and Person B.
For both subjects the computed absolute value of z associated with their IQ score is z = 2.33. For
z = 2.33, the tabled values in Columns 2 and 3, are, respectively, .4901 and .0099. The value in
Column 2 indicates that the proportion of the population that obtains a z score between the mean
and z = 2.33 is .4901 (which expressed as a percentage is 49.01%), and the proportion of the
population which obtains a z score between the mean and z = ­ 2.33 is .4901. We can make
comparable statements with respect to the IQ values associated with these z scores. Thus, we can

Figure 1.12 Summary of Example 1.1


50 Handbook o f Parametric and Nonparametric Statistical Procedures

say that the proportion of the population which obtains an IQ score between 100 and 135 is
.4901, and the proportion of the population which obtains an IQ score between 65 and 100 is
.4901. Since the normal distribution is symmetrical, .5 (or 50%) represents the proportion of
cases that falls both above and below the mean. Thus, we can determine that .5 + .4901 = .9901
(or 99.01%) is the proportion of people with an IQ of 135 or less, as well as the proportion of
people who have an IQ of 65 or greater. We can state that a person who has an IQ of 135 has a
score that falls at approximately the 99th percentile, since it is equal to or greater than the scores
of 99% o f the population. On the other hand, a person who has an IQ of 65 has a score that falls
at the 1st percentile, since it is equal to or greater than the scores of only approximately 1% of the
population.
The value in Column 3 indicates that the proportion of the population which obtains a score
of z = 2.33 or greater (and thus, in reference to Person A, an IQ of 135 or greater) is .0099 (which
is .99%). In the same respect, the proportion of the population that obtains a score of z = ­ 2.33
or less (and thus, in reference to Person B, an IQ of 65 or less) is .0099.
If one interprets the values in Columns 2 and 3 as probability values instead of proportions,
we can state that if one randomly selects a person from the population, the probability of
selecting someone with an IQ of 135 or greater will be approximately 1%. In the same respect,
the probability of selecting someone with an IQ of 65 or less will also be approximately 1%.
In the case o f Person C, whose IQ score of 100 results in the standard deviation score z = 0,
inspection of Table A1 reveals that the values in Columns 2 and 3 associated with z = 0 are,
respectively, .0000 and .5000. The latter values indicate the following: a) The proportion of the
population that obtains an IQ between the mean value of 100 and 100 is zero; b) The proportion
of the population that obtains an IQ of 100 or greater is .5 (which is equivalent to 50%), and that
the proportion of the population which obtains an IQ of 100 or less is .5. Thus, if we randomly
select a person from the population, the probability of selecting someone with an IQ equal to or
greater than 100 will be .5, and the probability of selecting someone with an IQ equal to or less
than 100 will be .5. We can also state that the score of a person who has an IQ of 100 falls at the
50th percentile, since it is equal to or greater than the scores of 50% of the population.
Note that to determine a percentile rank associated with a positive z value (or a score that
results in a positive z value), 50% should be added to the percentage of cases that fall between
the mean and that z value — in other words, the entry for the z value in Column 2 expressed as
a percentage is added to 50%. The 50% we add to the value in Column 2 represents the
percentage of the population that scores below the mean. The percentile rank for a negative z
value (or a score that results in a negative z value) is the entry in Column 3 for that z value
expressed as a percentage.
Figure 1.12 provides a graphic summary of the proportions discussed in the analysis of
Example 1.1.
Some additional examples will now be presented illustrating how the normal distribution
can be employed to compute proportions/percentages for a population. As is the case in Example
1.1, all of the examples will assume that the relevant population has a mean IQ of 100 and a
standard deviation o f 15. Consider Examples 1.2 - 1.5.

Example 1.2 What is the proportion/percentage of the population which has an IQ thatfalls
between 110 and 135?

Example 1.3 What is the proportion/percentage of the population which has an IQ that falls
between 65 and 15?
Introduction 51

Example 1.4 What is the proportion/percentage of the population which has an IQ that falls
between 15 and 110?

Example 1.5 What is the proportion/percentage of the population which has an IQ that is
greater than 110 or less than 15?

Figures 1.13 and 1.14 visually summarize the information necessary to compute the
proportions/percentages for Examples 1.2 - 1.5.
Example 1.2 asks for the proportion of cases between two IQ scores, both of which fall
above the population mean. In order to compute the latter, it is necessary to compute the z value
associated with each o f the two IQ scores stipulated in the example. Specifically, we must
compute z values for the IQ scores of 110 and 135. Within the framework of Example 1.1,
Equation 1.38 was employed to compute the value z = 2.33 for an IQ of 135. The latter equation
is employed below to compute the value z = .67 for an IQ of 110.

110 - 100
z = --------------- = .67
15
Employing Table A l, it was previously determined that a proportion equal to .4901 of the
population (which is equal to 49.01%) will have an IQ score between the mean of 100 and an IQ
of 135 (which is 2.33 standard deviation units above the mean (i.e., z = 2.33)). Employing
Column 2 of Table A l, we can determine that the proportion/percentage of the population which
has an IQ between the mean and a standard deviation score of z = .67 (which in the case of a
positive z value corresponds to an IQ score of 110) is .2486 (or 24.86%). The latter information
is visually summarized in Figure 1.13.
Note that in Figure 1.13, Area A represents the proportion/percentage of the normal
distribution that falls between the mean and 2.33 standard deviation units above the mean, while
Area B represents the proportion/percentage of the normal distribution that falls between the
mean and .67 standard deviation units above the mean. In order to determine the propor­
tion/percentage of the population that has an IQ between 110 and 135, it is necessary to compute
the proportion/percentage of the normal distribution which falls between .67 and 2.33 standard
deviation units above the mean. The latter area of the normal distribution is represented by Area
C in Figure 1.13. The value .2415 (or 24.15%) for Area C is obtained by subtracting the
proportion/percentage of the curve in Area B from the proportion/percentage of the curve in Area
A. In other words, the difference between Areas A and B will represent Area C. Thus, since
Area A - Area B = Area C, .4901 - .2486 = .2415.
The general rule which can be derived from the solution to Example 1.2 is as follows. When
one wants to determine the proportion/percentage of cases that falls between two scores in a
normal distribution, and both of the scores fall above the mean, the proportion/percentage
representing the area between the mean and the smaller of the two scores is subtracted from the
proportion/percentage representing the area between the mean and the larger o f the two scores.
Example 1.3 asks for the proportion of cases between two IQ scores, both of which fall
below the population mean. As was the case in Example 1.2, in order to compute the latter it is
necessary to compute the z value associated with each of the two IQ scores stipulated in the
example. Specifically, we must compute z values for the IQ scores of 65 and 75. Within the
framework of Example 1.1, Equation 1.38 was employed to compute the value z = ­ 2.33 for an
IQ of 65. The latter equation is employed below to compute the value z = ­ 1.67 for an IQ of 75.
Handbook o f Parametric and Nonparametric Statistical Procedures

Area A Area D
.4901 .0099
/V
Area B Area C
.2486 .2415
/V

-3 -2 -1 +1 +2 +3

55 70 85 100 115 130 145 X = IQ score


N/ \1 /
IQ = 110 IQ = 135
z = .67 z = 2.33

Figure 1.13 Computation of Relevant Proportions/Percentages Above the Mean

Area D Area A
.0099 .4901

-3 -2 -1 0 +1 +2 +3

55 85 100 115 130 145 X = IQ score


W
IQ = 65 IQ = 75
z = -2.33 z = ­1.67

Figure 1.14 Computation of Relevant Proportions/Percentages Below the Mean


Introduction 53

Employing Table A l, it was previously determined that a proportion equal to .4901 o f the
population (which is equal to 49.01%) will have an IQ score between the mean o f 100 and an IQ
of 65 (which is 2.33 standard deviation units below the mean (i.e., z = ­ 2.33)). Employing
Column 2 of Table A l, we can determine that the proportion/percentage o f the population which
has an IQ between the mean and a standard deviation score o f 1.67 (which in the case of z = ­
1.67 corresponds to an IQ score of 75) is .4525 (or 45.25%). The latter information is visually
summarized in Figure 1.14.
Note that in Figure 1.14, Area A represents the proportion/percentage o f the normal
distribution that falls between the mean and 2.33 standard deviation units below the mean, while
Area B represents the proportion/percentage of the normal distribution that falls between the
mean and 1.67 standard deviation units below the mean. In order to determine the proportion/
percentage of the population that has an IQ between 65 and 75, it is necessary to compute the
proportion/percentage o f the normal distribution which falls between 2.33 and 1.67 standard
deviation units below the mean. The latter area of the normal distribution is represented by Area
C in Figure 1.14. The value .0376 (or 3.76%) for Area C is obtained by subtracting the
proportion/percentage o fthe curve in Area B from the proportion/percentage o f the curve in Area
A. In other words, the difference between Areas A and B will represent Area C. Thus, since
Area A - Area B = Area C, .4901 - .4525 = .0376.
The general rule which can be derived from the solution to Example 1.3 is as follows. When
one wants to determine the proportion/percentage of cases that falls between two scores in a
normal distribution, and both of the scores fall below the mean, the proportion/percentage
representing the area between the mean and the less extreme of the two scores (i.e., the z score
with the smaller absolute value) is subtracted from the proportion/percentage representing the
area between the mean and the more extreme of the two scores (i.e., the z score with the larger
absolute value).
Based on the values computed for Examples 1.2 and 1.3 (which are summarized in Figures
1.13 and 1.14) we have enough information to answer the questions asked in Examples 1.4 and
1.5. Example 1.4 asks what proportion/percentage of the population has an IQ which falls between
75 and 110. In Example 1.2 it was determined that .2486 (or 24.86%) o f the population has an
IQ between 100 and 110, and in Example 1.3 it was determined that .4525 (or 45.25%) of the
population has an IQ between 100 and 75. The value .2486 is represented by Area B in Figure
1.13, and the value .4525 is represented by Area B in Figure 1.14. To determine the proportion/
percentage o f cases that obtain an IQ between 75 and 110 we merely added up the values for
Area B in both figures. Thus, .2486 + .4525 = .7011 (or 70.11%). Consequently we can say that
.7011 (or 70.11%) o f the population obtains an IQ between 75 and 110.
The general rule which can be derived from the solution to Example 1.4 is as follows. When
one wants to determine the proportion/percentage of cases that falls between two scores in a
normal distribution, and one of the scores falls above the mean and the other score falls below
the mean, the following protocol is employed. The proportion/percentage representing the area
between the mean and the score above the mean is added to the proportion/percentage
representing the area between the mean and the score below the mean.
Example 1.5 asks what proportion/percentage of the population has an IQ which falls above
110 or below 75. In Example 1.2 it was determined that .2486 (or 24.86%) of the population has
an IQ between 100 and 110. Since a proportion equal to .5 (or 50%) of the cases falls above the
mean, it logically follows that .5 - .2486 = .2514 (or 25.14%) of the cases will have an IQ above
110. The latter value can be obtained in Figure 1.13 by adding up the proportions designated for
Areas C and D (i.e., .2415 + .0099 = .2514). Note that the value .2514 corresponds to the
proportion recorded in Column 3 of Table A l for the value z = .67. This is the case since the
values in Column 3 of Table A l represent the proportion of cases which are more extreme than
54 Handbook o f Parametric and Nonparametric Statistical Procedures

the z score listed in the specified row (in the case of a positive z score it is the proportion of cases
above that z score).
In Example 1.3 it was determined that .4525 (or 45.25%) of the population has an IQ
between 100 and 75. Since a proportion equal to .5 (or 50%) of the cases falls below the mean,
it logically follows that .5 - .4525 = .0475 (or 4.75%) of the cases will have an IQ below 75. The
latter value can be obtained in Figure 1.14 by adding up the proportions designated for Areas C
and D (i.e., .0376 + .0099 = .0475). Note that the value .0475 corresponds to the proportion
recorded in Column 3 of Table A1 for the value z = 1.67. This is the case since, as previously
noted, the values in Column 3 of Table A1 represent the proportion of cases which are more
extreme than the z score listed in the specified row (in the case of a negative z score, it is the
proportion of cases below that z score).
To determine the proportion/percentage of cases that obtain an IQ above 110 or below 75,
we merely added up the values .2514 and .0475. Thus, .2514 + .0475 = .2989 (or 29.89%).
Consequently we can say that .2989 (or 29.89%) of the population obtains an IQ that is greater
than 110 or less than 75.
The general rule which can be derived from the solution to Example 1.5 is as follows. When
one wants to determine the proportion/percentage of cases that falls above one score that is above
the mean or below another score that is below the mean, the following protocol is employed. The
proportion/percentage representing the area above the score which is above the mean is added
to the proportion/percentage representing the area below the score which is below the mean.
Examples 1.6 - 1.8 will be employed to illustrate the use of Equation 1.39, which is derived
through algebraic transposition of the terms in Equation 1.38. Whereas Equation 1.38 assumes
that a subject’s score (i.e., the value X ) is known and allows for the computation of a z score,
Equation 1.39 does the reverse by computing a subject’s score through use of a known z value.

X = p ± za (Equation 1.39)

Example 1.6 What is the IQ o f a person whose score falls at the 75th percentile?

Example 1.7 What is the IQ o f a person whose score falls at the 4.75thpercentile?

Example 1.8 What are the IQ scores which define the middle 98% o f the population
distribution?

With respect to Example 1.6, in order to determine the IQ score that falls at the 75th
percentile, we must initially determine the z score which corresponds to the point on the normal
distribution that represents the 75thpercentile. Once the latter value has been determined, it can
be substituted in Equation 1.39, along with the values p = 100 (the population mean) and
a = 15 (the population standard deviation). When all three values are employed to solve the
latter equation, the value computed for X will represent the IQ score that falls at the 75th
percentile. In point of fact, the IQ score 110, which is employed in Example 1.2 and visually
represented in Figure 1.13, represents the IQ score at the 75thpercentile. The computation of the
IQ score 110 for the 75thpercentile will now be described.
In order to answer the question posed in Example 1.6, it is necessary to determine the z
value in Table A1 for which the proportion .25 is recorded in Column 2. If no proportion in
Column 2 is equal exactly to .25, the z value associated with the proportion which is closest to
.25 is employed. The proportion .25 (which corresponds to 25%) is used since a score at the 75th
percentile is equal to or greater than 75% of the scores in the distribution. The area of the normal
distribution it encompasses is the 50% of the population which has an IQ below the mean (which
Introduction 55

corresponds to a proportion of .5), as well as an additional 25% of the population which scores
directly above the mean (which corresponds to a proportion of .25, which will be recorded in
Column 2 of Table A l). In other words, 50% + 25% = 75%.
Employing Table A l, the z value with a proportion in Column 2 which is closest to .25 is
z = .67. Note that since there is no z value exactly equal to .25, the two z values with proportions
which are closest to being either above and below .25 are z = .67, for which the proportion is
.2486, and z = .68, for which the proportion is .2517. O f the latter two proportions, .2486 is
closer to .25. As previously noted in discussing Example 1.2, the proportion .2486 for z = .67
indicates that 24.86% of the population has an IQ between the mean and a z value of .67. When
the latter z value is substituted in Equation 1.39, the value X = 110.5 is computed. When 110.5
is rounded off to the nearest integer, the IQ value 110 is obtained. Thus, we can conclude that
an IQ o f 110 falls at the 75th percentile. The combination of Area B and the area below the mean
in Figure 1.13 visually represents the area which contains scores that fall below the 75th
percentile. (It should be noted that some researchers might prefer to employ the value z = .68 in
Equation 1.39 in the computation o f the 75thpercentile, since it provides the closest estimate that
is equal to or greater than the 75thpercentile.)

X = 100 + (.67X15) = 110.5

The general rule which can be derived from the solution to Example 1.6 is as follows. When
one wants to determine the score that falls at a specific percentile (which will be designated as
P) and the value of P is above the 50th percentile (i.e., the mean), the following protocol is
employed. The proportion in Column 2 of Table A l which is equal to the difference (P - .5) is
identified. The latter value is designated as Q. Thus, Q - P - . 5 . If there is no value equal to Q,
the value closest to Q is employed. The z value associated with the proportion Q in Column 2 is
substituted in Equation 1.39, along with the values of p and o. The resulting value of X
represents the score at the Pth percentile.
With respect to Example 1.7, in order to determine the IQ score that falls at the 4.75th
percentile, we must initially determine the z score which corresponds to the point on the normal
distribution that represents the 4.75thpercentile. Once the latter value has been determined, it can
be substituted in Equation 1.39, along with the values p = 100 (the population mean) and
o = 15 (the population standard deviation). When all three values are employed to solve the
latter equation, the value computed for X will represent the IQ score that falls at the 4.75th
percentile. In point of fact, the IQ score 75, which is employed in Example 1.3 and visually
represented in Figure 1.14, represents the IQ score at the 4.75thpercentile. The computation of
the IQ score 75 for the 4.75thpercentile will now be described.
In order to answer the question posed in Example 1.7, it is necessary to determine the z
value in Table A l for which the proportion .0475 is recorded in Column 3. (The value in Column
2 that is equal to .5 - .0475 = .4525 can also be employed, since it is associated with the same
z value.). If no proportion in Column 3 is equal exactly to .0475, the z value associated with the
proportion which is closest to .0475 is employed. The proportion .0475 (which corresponds to
4.75%) is used since a score at the 4.75thpercentile is equal to or greater than 4.75% of the scores
in the distribution. The area of the normal distribution it encompasses is the 4.75% of the
population which has an IQ that falls between the end of the left tail of the distribution and the
point demarcating the lower 4.75% of the scores in the distribution (all of which are below the
mean, since any percentile less than 50% is below the mean).
Employing Table A l, the z value with a proportion in Column 3 which equals .0475 is z
= 1.67. Since the percentile in question is less than 50%, a negative sign is employed for the z
value. Thus, z = - 1.67. The proportion .0475 for z = ­ 1.67 indicates that 4.75% of the population
56 Handbook o f Parametric and Nonparametric Statistical Procedures

has an IQ between the end of the left tail of the distribution and a z value o f - 1.67. When the
latter z value is substituted in Equation 1.39, the value X = 74.95 is computed. When 74.95 is
rounded off to the nearest integer, the IQ value 75 is obtained. Thus, we can conclude that an IQ
of 75 falls at the 4.75thpercentile. The sum of Areas C and D in Figure 1.14 visually represents
the area which contains scores that fall below the 4.75thpercentile

X = \00 + ( - 1.67)(15) = 74.95

The general rule which can be derived from the solution to Example 1.7 is as follows. When
one wants to determine the score that falls at a specific percentile (which will be designated as
P) and the value of P is below the 50th percentile (i.e., the mean), the following protocol is
employed. The z value is identified which corresponds to the proportion in Column 3 of Table
A l which is equal to the value of P. If there is no value equal to P, the value closest to P is
employed. The z value associated with the proportion P in Column 3 is assigned a negative sign
and is then substituted in Equation 1.39, along with the values of p and a. The resulting value
of X represents the score at the Pthpercentile.
Example 1.8 asks for the IQ scores which define the middle 98% of the distribution. Since
98% divided by 2 equals 49%, the middle 98% of the distribution will be comprised of the 49%
of the distribution directly above the mean (i.e., to the right of the mean), as well as the 49% of
the distribution directly below the mean (i.e., to the left of the mean). In point of fact, the middle
98% of the normal distribution is depicted in Figure 1.12. Specifically, in the latter figure it is the
area between the standard deviation scores z = +2.33 and z = ­ 2.33. Note that in Figure 1.12,
49.01 % (which corresponds to a proportion of .4901) of the cases fall between the mean and each
of the aforementioned z values.
What has been noted above indicates that in order to determine the IQ scores which define
the middle 98% o f the distribution, we must initially determine the z score for which the
proportion listed in Column 2 of Table A l is equal to .49 (which is 49% expressed as a
proportion). If no proportion in Column 2 is equal exactly to .49, the z value associated with the
proportion which is closest to .49 is employed. Equation 1.39 is then employed to solve fo r X for
both the positive and negative value of the z score which has a proportion equal to .49. As noted
above, the z value with the proportion in Column 2 that is closest to .49 is z = 2.33. Along with
the values p = 100 (the population mean) and o = 15 (the population standard deviation), the
values z = +2.33 and z = ­ 2.33 are substituted in Equation 1.39 below to compute the IQ values
65.05 and 134.95, which when rounded off equal 65 and 135. The latter two values represent the
IQ scores which are the limits of the middle 98% of the distribution. In other words, we can say
that 98% o f the population has an IQ that falls between 65 and 135. This can be stated
symbolically as follows: 65 < IQ < 135.

X = 100 + (2.33)(15) = 134.95

X = 100 + (- 2.33X15) = 65.05

The general rule which can be derived from the solution to Example 1.8 is as follows: When
one wants to determine the scores that define the middle P% of a normal distribution, the
following protocol is employed. The z value is identified which corresponds to the proportion in
Column 2 of Table A l which is equal to the value P/2 (when P% is expressed as a proportion).
If there is no value equal to P/2, the value closest to P/2 is employed. Both the positive and
negative forms of the z value associated with the proportion P/2 in Column 2 are substituted in
Equation 1.39, along with the values of p and o. The resulting values of X represent the limits
Introduction 57

of the middle P% of the distribution. The value of X derived for the negative z value represents
the lower limit, while the value of X derived for the positive z value represents the upper limit.
Thus, the middle P % of the distribution is any score equal to or greater than the computed lower
limit as well as equal to or less than the computed upper limit.

Hypothesis Testing

In inferential statistics sample data are primarily employed in two ways to draw inferences about
one or more populations. The two methodologies employed in inferential statistics are hypothesis
testing and estimation of population parameters. This section will discuss hypothesis testing.
To be more specific, the material to be presented in this section represents a general approach to
hypothesis testing which is commonly referred to as the classical hypothesis testing model. The
term classical hypothesis testing model (also referred to as the null hypothesis significance
testing model (NHST)) is employed to represent a model which resulted from a blending of the
views of the British statistician Sir Ronald Fisher (1925, 1955, 1956) with those of two of his
contemporaries, Jerzy Neyman and Egon Pearson (Neyman (1950), Neyman and Pearson (1928),
Pearson (1962)). In the next section the evolution of the classical hypothesis testing model will
be discussed in greater detail. In the latter discussion it will be noted that there were major
disagreements between Fisher and Neyman -Pearson regarding hypothesis testing, and, in point
of fact (as Gigerenzer (1993, p. 324) notes), the classical hypothesis testing model is the end
result of other people (specifically the authors of textbooks on inferential statistics during the
1950s and 1960s) integrating the disparate views of Fisher and Neyman -Pearson into a coherent
model. Alternative approaches to hypothesis testing will also be discussed in the next section, as
well as at other points in the book. The reader should keep in mind that there are those who reject
the classical hypothesis testing model or argue that it is not always the most appropriate one to
employ. Regardless of which hypothesis testing model one favors, a researcher should always
be open to employing an alternative model if it provides one with a higher likelihood of
discovering the truth. Additionally, within the framework of employing a hypothesis testing
model, one should never adopt a mechanical adherence to a set of rules which are viewed as
immutable. Put simply, if used judiciously the classical hypothesis testing model can provide
a researcher with a tremendous amount of useful information. On the other hand, if those who
employ it lack a conceptual understanding of the model and/or use it in an inflexible mechanical
manner, they may employ it inappropriately and thus arrive at erroneous conclusions.
The most basic concept in the classical hypothesis testing model is a hypothesis. Within
the framework of inferential statistics, a hypothesis can be defined as a prediction about a single
population or about the relationship between two or more populations. Hypothesis testing is a
procedure in which sample data are employed to evaluate a hypothesis. In using the term
hypothesis, some sources make a distinction between a research hypothesis and statistical
hypotheses.
A research hypothesis is a general statement of what a researcher predicts. Two examples
of a research hypothesis are: a) The average IQ of all males is some value other than 100; and
b) Clinically depressed patients who take an antidepressant for six months will be less depressed
than clinically depressed patients who take a placebo for six months.
In order to evaluate a research hypothesis, it is restated within the framework of two
statistical hypotheses. Through use of a symbolic format, the statistical hypotheses summarize
the research hypothesis with reference to the population parameter or parameters under study.
The two statistical hypotheses are the null hypothesis, which is represented by the notation HQ,
and the alternative hypothesis, which is represented by the notation H l .
58 Handbook o f Parametric and Nonparametric Statistical Procedures

The null hypothesis is a statement of no effect or no difference. Since the statement of the
research hypothesis generally predicts the presence of an effect or a difference with respect to
whatever it is that is being studied, the null hypothesis will generally be a hypothesis the
researcher expects to be rejected. The alternative hypothesis, on the other hand, represents a
statistical statement indicating the presence of an effect or a difference. Since the research
hypothesis typically predicts an effect or difference, the researcher generally expects the
alternative hypothesis to be supported.19
The null and alternative hypotheses will now be discussed in reference to the two research
hypotheses noted earlier. Within the framework of the first research hypothesis which was pre ­
sented, we will assume that a study is conducted in which an IQ score is obtained for each of n
males who have been randomly selected from a population comprised of N males. The null and
alternative hypotheses can be stated as follows: H0: p = 100 a n d //,: p * 100. The null hy­
pothesis states that the mean (IQ score) of the population the sample represents equals 100. The
alternative hypothesis states that the mean of the population the sample represents does not equal
100. The absence of an effect will be indicated by the fact that the sample mean is equal to or
reasonably close to 100. If such an outcome is obtained, a researcher can be reasonably confident
that the sample has come from a population with a mean value of 100. The presence of an effect,
on the other hand, will be indicated by the fact that the sample mean is significantly above or
below the value 100. Thus, if the sample mean is substantially larger or smaller than 100, the
researcher can conclude there is a high likelihood the population mean is some value other than
100, and thus reject the null hypothesis.
As stated above, the alternative hypothesis is nondirectional. A nondirectional (also
referred to as a two-tailed) alternative hypothesis does not make a prediction in a specific
direction. The alternative hypothesis H x\ p * 100 just states that the population mean will not
equal 100, but it does not predict whether it will be less than or greater than 100. If, however,
a researcher wants to make a prediction with respect to direction, the alternative hypothesis can
be stated directionally. Thus, with respect to the above example, either o f the following two
directional (also referred to as one-tailed) alternative hypotheses can be employed:
H {\ p > 100 or //,: p < 100.
The alternative hypothesis H {\ p > 100 states the mean of the population the sample
represents is some value greater than 100. If the directional alternative hypothesis H {: p > 100
is employed, the null hypothesis can only be rejected if the data indicate that the population mean
is some value above 100. The null hypothesis cannot, however, be rejected if the data indicate
the population mean is some value below 100.
The alternative hypothesis H {: p < 100 states the mean of the population the sample
represents is some value less than 100. If the directional alternative hypothesis //,: p < 100 is
employed, the null hypothesis can only be rejected if the data indicate the population mean is
some value below 100. The null hypothesis cannot, however, be rejected if the data indicate the
population mean is some value above 100. The reader should take note of the fact that although
there are three possible alternative hypotheses that one can employ (one that is nondirectional and
two that are directional), the researcher must select only one of the alternative hypotheses.
Researchers are not in agreement with respect to the conditions under which one should
employ a nondirectional or a directional alternative hypothesis. Some researchers take the posi­
tion that a nondirectional alternative hypothesis should always be employed, regardless of one’s
prior expectations about the outcome of an experiment. Other researchers believe that a non ­
directional alternative hypothesis should only be employed when one has no prior expectations
about the outcome of an experiment (i.e., no expectation with respect to the direction of an effect
or difference). These same researchers believe that if one does have a definite expectation about
the direction of an effect or difference, a directional alternative hypothesis should be employed.
Introduction 59

One advantage of employing a directional alternative hypothesis is that in order to reject the null
hypothesis, a directional alternative hypothesis does not require there be as large an effect or
difference in the sample data as will be the case if a nondirectional alternative hypothesis is em­
ployed.
The second of the research hypotheses discussed earlier in this section predicted that an
antidepressant will be more effective than a placebo in treating depression. Let us assume that
in order to evaluate this research hypothesis, a study is conducted which involves two groups of
clinically depressed patients. One group, which will represent Sample 1, is comprised of n x
patients, and the other group, which will represent Sample 2, is comprised of n2 patients. The
subjects in Sample 1 take an antidepressant for six months, and the subjects in Sample 2 take a
placebo during the same period of time. After six months have elapsed, each subject is assigned
a score with respect to his or her level of depression.
The null and alternative hypotheses can be stated as follows: H0: p, = p2 and
Hy Pj * p2. The null hypothesis states that the mean (depression score) of the population
Sample 1 represents equals the mean of the population Sample 2 represents. The alternative
hypothesis (which is stated nondirectionally) states that the mean of the population Sample 1
represents does not equal the mean of the population Sample 2 represents. In this instance the
two populations are a population comprised of A, clinically depressed people who take an anti­
depressant for six months versus a population comprised of N2 clinically depressed people who
take a placebo for six months. The absence of an effect or difference will be indicated by the fact
that the two sample means are exactly the same value or close to being equal. If such an outcome
is obtained, a researcher can be reasonably confident that the samples do not represent two
different populations.20 The presence of an effect, on the other hand, will be indicated if a
significant difference is observed between the two sample means. Thus, we can reject the null
hypothesis if the mean of Sample 1 is significantly larger than the mean of Sample 2, or the mean
of Sample 1 is significantly smaller than the mean of Sample 2.
As is the case with the first research hypothesis discussed earlier, the alternative hypothesis
can also be stated directionally. Thus, either of the following two directional alternative hypoth­
eses can be employed: H y Pj > p2 or Hy gj < p2.
The alternative hypothesis Hy pj > p2 states the mean o f the population Sample 1
represents is greater than the mean of the population Sample 2 represents. If the directional
alternative hypothesis H x p2 is employed, the null hypothesis can only be rejected if the
data indicate the mean of Sample 1 is significantly greater than the mean of Sample 2. The null
hypothesis cannot, however, be rejected if the mean of Sample 1 is significantly less than the
mean of Sample 2.
The alternative hypothesis Hy p, < p2 states the mean of the population Sample 1
represents is less than the mean of the population Sample 2 represents. If the directional
alternative hypothesis H y p, < p2 is employed, the null hypothesis can only be rejected if the
data indicate the mean of Sample 1 is significantly less than the mean of Sample 2. The null
hypothesis cannot, however, be rejected if the mean of Sample 1 is significantly greater than the
mean of Sample 2.
Upon collecting the data for a study, the next step in the hypothesis testing procedure is to
evaluate the data through use of the appropriate inferential statistical test. An inferential statistical
test yields a test statistic. The latter value is interpreted by employing special tables which
contain information documenting the expected distribution of the test statistic. More specifically,
such tables contain extreme values of the test statistic (referred to as critical values) that are
highly unlikely to occur if the null hypothesis is true. Such tables allow a researcher to determine
whether or not the result of a study is statistically significant.
60 Handbook o f Parametric and Nonparametric Statistical Procedures

The term statistical significance implies that one is determining whether or not an obtained
difference in an experiment is likely to be due to chance or due to the presence of a genuine
experimental effect. To clarify this, think of a roulette wheel on which there are 38 possible
numbers that may occur on any roll of the wheel. Suppose we spin a wheel 38,000 times. On the
basis of chance each number should occur 1/38thof the time, and thus each value should occur
1000 times (i.e., 38000 38 = 1000). Suppose the number 32 occurs 998 times in 38,000 spins
of the wheel. Since this value is close to the expected value of 1000, it is highly unlikely that the
wheel is biased against the number 32, and is thus not a fair wheel (at least in reference to the
number 32). This is because 998 is extremely close to 1000, and a difference of 2 outcomes isn’t
unlikely on the basis of the random occurrence of events (i.e., chance). On the other hand, if the
number 32 only occurs 380 times in 38,000 trials (i.e., 1/100th of the time), since 380 is well
below the expected value of 1000, this strongly suggests that the wheel is biased against the
number 32 (and is thus probably biased in favor of one or more of the other numbers). Although
it is theoretically possible that a specific number can occur 380 times in 38,000 trials on a roulette
wheel, the likelihood of such an occurrence is extremely remote. Consequently, because of the
latter a casino would probably conclude it was in their best interests to view the wheel in question
as defective and thus replace it.
When evaluating the results of an experiment, one employs a logical process similar to that
involved in the above situation with the roulette wheel. The decision on whether to retain or re­
ject the null hypothesis is based on contrasting the observed outcome of an experiment with the
outcome one can expect if, in fact, the null hypothesis is true. This decision is made by using the
appropriate inferential statistical test. An inferential statistical test is essentially an equation
which describes a set of mathematical operations to be performed on the data obtained in a study.
The end result of conducting such a test is a final value which is designated as the test statistic.
A test statistic is evaluated in reference to a sampling distribution, which is a theoretical
probability distribution of all the possible values the test statistic can assume if one were to
conduct an infinite number of studies employing a sample size equal to that used in the study.
The probabilities for a sampling distribution are based on the assumption that each of the samples
is randomly drawn from the population it represents.
When evaluating the study involving the use of a drug versus a placebo in treating depres­
sion, the researcher is asking if the difference between the scores of the two groups is due to
chance, or if instead is due to some nonchance factor (which in a well controlled study will be
the different treatments to which the groups are exposed). The larger the difference between the
average scores of the two groups (just like the larger the difference between the observed and
expected occurrence of a number on a roulette wheel), the less likely the difference is due to
chance factors, and the more likely it is due to the experimental treatments. Thus, by declaring
a difference statistically significant, the researcher is saying that based on an analysis of the
sampling distribution of the test statistic, it is highly unlikely that a difference equal to or greater
than that which was observed in the study could have occurred as the result of chance. In view
of this, the most logical decision is to conclude that the difference is due to the experimental
treatments, and thus reject the null hypothesis.
Scientific convention has established that in order to declare a difference statistically
significant, there can be no more than a 5% likelihood that the difference is due to chance. If a
researcher believes that 5% is too high a value, one may elect to employ a 1%, or an even lower
minimum likelihood, before one will be willing to conclude that a difference is significant. The
notation p > .05 is employed to indicate the result of an experiment is not significant. The latter
notation indicates there is a greater than 5% likelihood that an observed difference or effect could
be due to chance. On the other hand, the notation p < .05 indicates the outcome of a study is
significant at the .05 level, which means there is less than a 5% likelihood that an obtained
Introduction 61

difference or effect can be due to chance.21 The notation p < .01 indicates a significant result at
the .01 level (i.e., there is less than a 1% likelihood the difference is due to chance).
When the normal distribution is employed for inferential statistical analysis, four tabled
critical values are commonly employed. These values are summarized in Table 1.12.

Table 1.12 Tabled Critical Two-Tailed and One-Tailed .05 and .01 z Values

Z .05 ZM

Two-tailed values 1.96 2.58


One-tailed values 1.65 2.33

The value z = 1.96 is referred to as the tabled critical two-tailed .05 z value. This value
is employed since the total proportion of cases in the normal distribution that falls above z =
+1.96 or below z = -1.96 is .05. This can be confirmed by examining Column 3 of Table Al with
respect to the value z = 1.96. The value of .025 in Column 3 indicates the proportion of cases in
the right tail of the curve which falls above z = +1.96 is .025, and the proportion of cases in the
left tail of the curve which falls below z = -1.96 is .025. If the two .025 values are added, the
resulting proportion is .05. Note that this is a two-tailed critical value, since the proportion .05
is based on adding the extreme 2.5% of the cases from the two tails of the distribution.
The value z = 2.58 is referred to as the tabled critical two-tailed .01 z value. This value
is employed since the total proportion of cases in the normal distribution that falls above z =
+2.58 or below z = -2.58 is .01. This can be confirmed by examining Column 3 of Table A l with
respect to the value z = 2.58. The value of .0049 (which rounded off equals .005) in Column 3
indicates the proportion of cases in the right tail of the curve which falls above z = +2.58 is .0049,
and the proportion of cases in the left tail of the curve which falls below z = -2.58 is .0049. If
the two .0049 values are added, the resulting proportion is .0098, which rounded off equals .01.
Note that this is a two -tailed critical value, since the proportion .01 is based on adding the
extreme .5% of the cases from the two tails of the distribution.
The value z = 1.65 is referred to as the tabled critical one-tailed .05 z value. This value
is employed since the proportion of cases in the normal distribution that falls above z = +1.65 or
below z = -1.65 in each tail of the distribution is .05. This can be confirmed by examining
Column 3 of Table A l with respect to the value z = 1.65. The value of .0495 (which rounded off
equals .05) in Column 3 indicates the proportion of cases in the right tail of the curve which falls
above z = +1.65 is .0495, and the proportion of cases in the left tail of the curve which falls
below z = -1.65 is .0495. Note that this is a one-tailed critical value, since the proportion .05 is
based on the extreme 5% of the cases in one tail of the distribution.22
The value z = 2.33 is referred to as the tabled critical one-tailed .01 z value. This value
is employed since the proportion of cases in the normal distribution that falls above z = +2.33 or
below z = -2.33 in each tail of the distribution is .01. This can be confirmed by examining
Column 3 of Table A l with respect to the value z = 2.33. The value of .0099 (which rounded off
equals .01) in Column 3 indicates the proportion of cases in the right tail of the curve which falls
above z = +2.33 is .0099, and the proportion of cases in the left tail of the curve which falls
below z = -2.33 is .0099. Note that this is a one­tailed critical value, since the proportion .01 is
based on the extreme 1% o f the cases in one tail of the distribution.
Although in practice it is not scrupulously adhered to, the conventional hypothesis testing
model employed in inferential statistics assumes that prior to conducting a study a researcher
stipulates whether a directional or nondirectional alternative hypothesis will be employed, as well
as at what level of significance the null hypothesis will be evaluated. The probability value which
62 Handbook o f Parametric and Nonparametric Statistical Procedures

identifies the level of significance is represented by the notation a, which is the lower case Greek
letter alpha. Throughout the book the latter value will be referred to as the prespecified alpha
value (or prespecified level of significance), since it will be assumed that the value was
specified prior to the data collection phase of a study.

Type I errors, Type II, errors and power in hypothesis testing Within the framework of
hypothesis testing, it is possible for a researcher to commit two types of errors. These errors are
referred to as a Type I error and a Type II error.
A Type I error is when a true null hypothesis is rejected (i.e., one concludes that a false
alternative hypothesis is true). The likelihood of committing a Type I error is specified by the
alpha level a researcher employs in evaluating an experiment. The more concerned a researcher
is with committing a Type I error, the lower the value of alpha the researcher should employ.
Thus, the likelihood of committing a Type I error if a = .01 is 1%, as compared with a 5%
likelihood if a = .05.
A Type II error is when a false null hypothesis is retained (i.e., one concludes that a true
alternative hypothesis is false). The likelihood of committing a Type II error is represented by
|3, which (as noted earlier) is the lower case Greek letter beta. The likelihood of rejecting a false
null hypothesis represents the power of a statistical test. The power of a test is determined by
subtracting the value of beta from 1 (i.e., Power = 1 ­ P). The likelihood of committing a Type
II error is inversely related to the likelihood of committing a Type I error. In other words, as the
likelihood of committing one type of error decreases, the likelihood of committing the other type
of error increases. Thus, with respect to the alternative hypothesis one employs, there is a higher
likelihood of committing a Type II error when alpha is set equal to .01 than when it is set equal
to .05. The likelihood of committing a Type II error is also inversely related to the power of a
statistical test. In other words, as the likelihood of committing a Type II error decreases, the
power of the test increases. Consequently, the higher the alpha value (i.e., the higher the
likelihood of committing a Type I error), the more powerful the test. Two other ways to increase
the power of a test (and thereby decrease the likelihood of committing a Type II error) are: a)
Increasing the sample size employed in a study; and b) Minimizing the amount of variability in
a set of data that is attributable to extraneous factors.
The relationship between Type I error rate, Type II error rate, and power can be
summarized as follows: a) The higher the value of alpha the greater the likelihood of committing
a Type I error. As the value of alpha increases, the likelihood of committing a Type II error
decreases and the power of the statistical test increases. In other words, the higher the value of
alpha the easier it will be for a researcher to reject the null hypothesis. To put it another way, the
higher the value of alpha the more likely it is that the alternative hypothesis will be supported;
b)The lower the value of alpha the lower the likelihood of committing a Type I error. As the
value of alpha decreases, the likelihood of committing a Type II error increases and the power
of the statistical test decreases. In other words, the lower the value of alpha the more difficult
it will be for a researcher to reject the null hypothesis. To put it another way, the lower the value
of alpha the less likely it is that the alternative hypothesis will be supported.
Table 1.13 summarizes the decision making process within the framework of the classical
hypothesis testing model. In the latter table the True State of Nature refers to what is, in fact,
the truth regarding the actual status of the null hypothesis (i.e., H0). The p value represents the
probability associated with a specific decision.
Before continuing the reader should take note of the fact that in the Preface it was noted
that all of the examples employed in the book involve the use of small sample sizes. The
consequences of the latter are inflated sampling error as well as low power (which is
associated with a high Type II error rate) for any inferential statistical test conducted. It should
Introduction 63

Table 1.13 Summary of Decision Making Process in Hypothesis Testing

True State of Nature

Decision H0 True H0 False

Type I Error Correct Decision


Reject H0 p =a p - 1 - (3 = Power

Correct Decision Type II error

CQ.
1
Fail to reject H0

II

II
be emphasized that the use of small sample sizes is purely for illustrative purposes in order to
facilitate the reader’s ability to follow the computational procedures described in the book. In
practice, the sample size employed in research should be substantially larger than those employed
in the examples in this book. In the final analysis, determination of sample size should be based
on what a researcher considers to be an acceptable compromise with respect to the Type I versus
Type II error rates associated with a study. As will be noted later, over the years researchers have
been criticized (by sources such as Cohen (1962, 1977) and Hunter and Schmidt (1990,2004))
for conducting studies with inadequate power (which can often be attributed to small sample
sizes), thereby making it difficult to reject the null hypothesis, when, in fact, the latter is false.
However, it will also be noted (later in this chapter as well as in the discussion of the minimum-
effect hypothesis testing model in Section VII of the chapter on meta-analysis (Test 43)) that
as sample size increases, the likelihood the null hypothesis will be rejected approaches 100%.
Example 1.9 will be employed to illustrate the application of the above described concepts
on hypothesis testing in reference to standard deviation (z) scores discussed in the previous
section. In point of fact, the analysis of Example 1.9 illustrates the use of the single-sample z test
(discussed in the next chapter) when n - 1.

Example 1.9 Assume a researcher wantsto determinewhether ornotitis likelythat an


individual who has beenrandomly selectedfrom a population could be a member o fa normally
distributedpopulation which has a mean IQ o f \ 00 (i.e., p = 100^ and a standard deviation o f
15 (i.e., a = 15/ Thus, employing a sample size o f n — 1, the researcher evaluates the null
hypothesis H0: p = 100. Determine what the IQ o f the person would have to be in order to
reject the null hypothesis at th e .05 a n d .01 levels o f significance within theframework o f both
a one- and two-tailed analysis.

The results of the analysis of Example 1.9 are summarized below.

1: One-tailed (directional) .05 (5%) alpha/significance level employed


H0: p = 100
//,: p < 100 or H x\p > 100

a) The likelihood of selecting a person with an IQ of 75.25 or less from a normally


distributed population with a mean of 100 and a standard deviation of 15 is .05 (or 5%). The
latter is the case since X = p ± o z = 100 + (1 5 )( - 1.65) = 100 - 24.75 = 75.25. Note that the
64 Handbook o f Parametric and Nonparametric Statistical Procedures

value z = 1.65 delineates the extreme 5% of scores in each tail of the normal distribution. If the
directional alternative hypothesis H {: p < 100 is employed, the null hypothesis H0: p = 100
can be rejected at the .05 level if the person’s IQ is equal to or less than 75.25. If the null
hypothesis is rejected, there is still a 5% likelihood the researcher’s decision to reject the null
hypothesis represents a Type I error. In other words, there is still a 5% likelihood the null
hypothesis is, in fact, true. Or to put it another way, there is still a 5% likelihood that a person
with an IQ of 75.25 or less is a member of a population which has a mean of 100 and a standard
deviation of 15. Figure 1.15a depicts this example visually.
b) The likelihood of selecting a person with an IQ of 124.75 or greater from a normally
distributed population with a mean of 100 and a standard deviation of 15 is .05 (or 5%). The
latter is the case since Ar = p ± o z = 100 + (15)(1.65) = 100 + 24.75 = 124.75. Note that the
value z = 1.65 delineates the extreme 5% of scores in each tail of the normal distribution. If the
directional alternative hypothesis H {: p > 100 is employed, the null hypothesis H0: p = 100
can be rejected at the .05 level if the person’s IQ is equal to or greater than 124.75. If the null
hypothesis is rejected, there is still a 5% likelihood the researcher’s decision to reject the null
hypothesis represents a Type I error. In other words, there is still a 5% likelihood the null
hypothesis is, in fact, true. Or to put it another way, there is still a 5% likelihood that a person
with an IQ of 124.75 or greater is a member of a population which has a mean of 100 and a
standard deviation of 15. Figure 1.15b depicts this example visually.

2: One-tailed (directional) .01 (1%) alpha/significance level employed


H0: p = 100
Hy p < 100 or H y p > 100

a) The likelihood o f selecting a person with an IQ of 65.05 or less from a normally


distributed population with a mean of 100 and a standard deviation of 15 is .01 (or 1%). The
latter is the case since 2 f = p ± o z = 100 + (15)( - 2.33) = 100 - 34.95 = 65.05. Note that the
value z = 2.33 delineates the extreme 1% of scores in each tail of the normal distribution. If the
directional alternative hypothesis H x: p < 100 is employed, the null hypothesis HQ: p = 100
can be rejected at the .01 level if the person’s IQ is equal to or less than 65.05. If the null
hypothesis is rejected, there is still a 1% likelihood the researcher’s decision to reject that null
hypothesis represents a Type I error. In other words, there is still a 1% likelihood the null
hypothesis is, in fact, true. Or to put it another way, there is still a 1% likelihood that a person
with an IQ of 65.05 or less is a member of a population which has a mean of 100 and a standard
deviation of 15. Figure 1.15c depicts this example visually.
b) The likelihood of selecting a person with an IQ of 134.95 or greater from a normally
distributed population with a mean of 100 and a standard deviation of 15 is .01 (or 1%). The
latter is the case since X = p ± o z = 100 + (15)(2.33) = 100 + 34.95 = 134.95. Note that the
value z = 2.33 delineates the extreme 1% of scores in each tail of the normal distribution. If the
directional alternative hypothesis H x\ p > 100 is employed, the null hypothesis H0: p = 100
can be rejected at the .01 level if the person’s IQ is equal to or greater than 134.95. If the null
hypothesis is rejected, there is still a 1% likelihood the researcher’s decision to reject the null
hypothesis represents a Type 1 error. In other words, there is still a 1% likelihood the null
hypothesis is, in fact, true. Or to put it another way, there is still a 1% likelihood that a person
with an IQ o f 134.95 or greater is a member of a population which has a mean of 100 and a
standard deviation of 15. Figure I.15d depicts this example visually.
Introduction

Figure 1.15a Distribution of Critical One-Tailed .05 z Value for H x: p < 100

Figure I.15b Distribution of Critical One-Tailed .05 z Value for H t: p > 100

Figure I.15c Distribution of Critical One-Tailed .01 z Value for Hx: p < 100
Handbook o f Parametric and Nonparametric Statistical Procedures

Figure I.15d Distribution of Critical One-Tailed .01 z Value for H p \i> 100

Figure I.15e Distribution o f Critical Two-Tailed .05 z Value for H t: p * 100

Figure I.15f Distribution of Critical Two-Tailed .01 z Value for //,: jli # 100
Introduction 67

3: Two-tailed (nondirectional) .05 (5%) alpha/significance level employed


H0: p = 100
H y P * 100

The likelihood of selecting a person with an IQ of 70.6 or less is .025 (or 2.5%), and the
likelihood of selecting a subject with an IQ of 129.4 or greater is .025 (or 2.5%). The latter is the
case since X - p ± o z = 100 ± (15)( 1.96)= 100 ± 29.4, and 100 - 29.4 = 70.6 and 100 + 29.4
= 129.4. Note that the value z = 1.96 delineates the extreme 2.5% of scores in each tail of the
normal distribution. The likelihood of selecting a person with an IQ equal to or less than 70.6 or
equal to or greater than 129.4 is .025 + .025 = .05 (or 5%). If the nondirectional alternative
hypothesis H {: p * 100 is employed, the null hypothesis H0: p = 100 can be rejected at the
.05 level if the person’s IQ is equal to or less than 70.6 or is equal to or greater than 129.4. If the
null hypothesis is rejected, there is still a 5% likelihood the researcher’s decision to reject the null
hypothesis represents a Type I error. In other words, there is still a 5% likelihood the null
hypothesis is, in fact, true. Or to put it another way, there is still a 5% likelihood that a person
with an IQ equal to or less than 70.6 or equal to or greater than 129.4 is a member of a population
which has a mean of 100 and a standard deviation of 15. Figure I.15e depicts this example
visually.

4: Two-tailed (nondirectional) .01 (1%) alpha/significance level employed


H0: p = 100
Hy p * 100

The likelihood of selecting a person with an IQ of 61.3 or less is .005 (or .5%), and the
likelihood of selecting a subject with an IQ of 138.7 or greater is .005 (or .5%). The latter is the
case since X = p ± o z = 100 ± (15)(2.58) = 100± 38.7, and 100 - 38.7 = 61.3 and 100 + 38.7
= 138.7. Note that the value z = 2.58 delineates the extreme .5% of scores in each tail of the
normal distribution. The likelihood of selecting a person with an IQ equal to or less than 61.3
or equal to or greater than 138.7 is .005 + .005 = .01 (or 1%). If the nondirectional alternative
hypothesis H {: p * 100 is employed, the null hypothesis HQ: p = 100 can be rejected at the
.01 level if the person’s IQ is equal to or less than 61.3 or is equal to or greater than 138.7. If the
null hypothesis is rejected, there is still a 1% likelihood the researcher’s decision to reject the null
hypothesis represents a Type I error. In other words, there is still a 1% likelihood the null
hypothesis is, in fact, true. Or to put it another way, there is still a 1% likelihood that a person
with an IQ equal to or less than 61.3 or equal to or greater than 138.7 is a member of a population
which has a mean of 100 and a standard deviation of 15. Figure 1.15f depicts this example
visually.

Statistical significance versus practical significance When the term significance is employed
within the context of scientific research, it is instructive to make a distinction between statistical
significance and practical significance. Statistical significance only implies that the outcome
of a study is highly unlikely to have occurred as a result of chance. It does not necessarily
suggest that any difference or effect detected in a set of data is of any practical value. As an
example, assume that the Scholastic Aptitude Test (SAT) scores of two school districts which
employ different teaching methods are contrasted. Assume that the teaching method of each
school district is based on specially designed classrooms. The results of the study indicate that
the SAT average in School District A is one point higher than the SAT average in School District
B, and this difference is statistically significant at the .01 level. Common sense suggests it would
be illogical for School District B to invest the requisite time and money in order to redesign its
68 Handbook o f Parametric and Nonparametric Statistical Procedures

physical environment for the purpose of increasing the average SAT score in the district by one
point. Thus, in this example, even though the obtained difference is statistically significant, in
the final analysis it is of little or no practical significance. The general issue o f statistical versus
practical significance is discussed in more detail in Section VI of the t test for two independent
samples, and in the chapter on meta-analysis.
Before proceeding to the next topic, a final comment is in order on the general subject of
tests of statistical significance. It cannot be emphasized too strongly that a test of significance
merely represents the final stage in a sequential process involved in conducting research. Those
stages which precede the test of significance consist of whatever it is a researcher does within
the framework o f designing and executing a study. If at some point there is a flaw in the design
and/or execution of a study, it will have a direct impact on the reliability of the results. Under
such circumstances, any data that are evaluated will, for all practical purposes, be of little or no
value. The cliche “garbage in, garbage out” is apropos here, in that if imperfect data are evaluated
with a test of significance, the resulting analysis will be unreliable. Ultimately, a test of
significance is little more than an algorithm which has been derived for evaluating a set of data
under certain conditions. The test, in and of itself, is incapable of making ajudgment with respect
to the reliability and overall quality of the data. To go further, the probability values employed
in evaluating a test of significance are derived from theoretical probability distributions. In order
for any probability derived from such a distribution to be reliable, one or more assumptions
underlying the distribution must not have been violated. In the final analysis, a probability value
derived from a test of significance should probably be viewed at best as an estimate of the actual
probability associated with a result, and at worst (when data are derived in a poorly designed
and/or methodologically weak study) as a woefully inaccurate metric that has little or no
relevance to the problem being evaluated.
Although the hypothesis testing model described in this section is based on conducting a
single study in order to evaluate a research hypothesis, throughout the book the author
emphasizes the importance of replication in research. Aside from the fact that in any given study
sampling error may result in a researcher reaching the wrong conclusion, there is also the fact that
inferential statistical tests make certain assumptions, many of which the researcher can never be
sure have been met. Since the accuracy of the probability values in tables of critical values for
test statistics are contingent upon the validity of the assumptions underlying the test, if any of the
assumptions have been violated, the accuracy of the tables can be compromised, thus increasing
the likelihood the decision a researcher makes will represent either a Type I or Type II error. In
view of the aforementioned factors which can compromise the outcome of a single experiment,
the most effective way of determining the truth with regard to a particular hypothesis (especially
if practical decisions are to be made on the basis of the results of research) is to conduct multiple
studies which evaluate the same hypothesis. When multiple studies yield consistent results, one
is less likely to be challenged that the correct decision has been made with respect to the
hypothesis under study. A general discussion of statistical methods which can be employed to
aid in the interpretation of the results of multiple studies that evaluate the same general
hypothesis can be found in the chapter on meta-analysis.

A History and Critique of the Classical Hypothesis Testing Model

The intent of this section is to provide a brief history of the individuals and events which were
responsible for the development of the classical hypothesis testing model. The statistical
techniques which are most commonly employed in inferential and descriptive statistics were
developed by a small group of Englishmen during the latter part of the nineteenth century through
the middle of the twentieth century. Readers who are interested in a more comprehensive
Introduction 69

overview of the individuals to be discussed in this section, as well as biographical information


on other notable individuals in the history of statistics, should consult sources such as Cowles
(1989,2001), Johnson and Kotz (1997), Kline (2004), Stigler (1999), and Tankard (1984).
During the late 1800s Sir Francis Galton (1822 - 1911) is credited with introducing the
important statistical concepts of correlation and regression. (Correlation is discussed briefly in
the latter part of this chapter, and both correlation and regression are described in detail in the
chapter on the Pearson product-moment correlation coefficient.) Galton referred to the
science of statistics as biometrics, since he viewed statistics as a discipline which employed
mathematics to study issues in both the biological and social sciences. Another Englishman Karl
Pearson (1857 - 1936), subsequently developed the mathematical procedure for computing a
correlation coefficient (which is described in the chapter on the Pearson product-moment
correlation coefficient). Along with Sir Ronald Fisher, Pearson is probably viewed as having
made the greatest contributions to what today is considered the basis of modem statistics. In
addition to his work on correlation, Pearson also discovered the chi-square distribution (a
theoretical probability distribution) and the chi-square goodness-of-fit test (Test 8)). Through­
out his professional life Pearson was a source of controversy, much of it resulting from an
enthusiasm which he shared with Galton for eugenics (which is the use of selective breeding to
enhance the characteristics of the human race). During the latter part of his professional career
Pearson was appointed as the first Galton Professor of Eugenics at University College in London
(the chair was bequeathed as a gift to the university by Galton). In 1901 Pearson cofounded the
influential statistical journal Biometrika, in part to provide himself with an outlet to publish some
of his more controversial research. Another source of controversy surrounding Pearson revolved
around his acrimonious personal and professional differences with Sir Ronald Fisher.
The major contribution of William S. Gosset (1876 -1937), who briefly studied under
Pearson, was his discovery of the t distribution (a theoretical probability distribution), and his
development of the Mest (discussed under the single-sample t test and the t tests for two
independent and dependent samples (Tests 11 and 17)). Lehman (1993, p. 1242) states that
the modem theory of hypothesis testing was initiated in 1908 with Gosset’s development of the
Mest. Johnson and Kotz (1997, p. 328) note that Gosset was the individual who first introduced
the concept of the alternative hypothesis in hypothesis testing. For many years Gosset was
employed at the Guiness Brewery in Dublin where he employed the scientific method to evaluate
beer making. Throughout his professional career he published under the name Student, since the
Guiness Brewery wished for his identity to remain anonymous.
Although the three men discussed up to this point had a major impact on the development
of modem statistics, those individuals who were most responsible for the development of what
was eventually to become known as the classical hypothesis testing model were the British
statistician Sir Ronald Fisher (1890 - 1962) and two of his contemporaries Egon Pearson
(1895-1980) (who was the son of Karl Pearson) and Jerzy Neyman (1894-1981). As noted
earlier, the classical hypothesis testing model (i.e., the methodology and concepts discussed in
the previous section) is the end result of blending a hypothesis testing model developed by Fisher
(1925,1955,1956) with an alternative model developed by Neyman and Egon Pearson (Neyman
(1950), Neyman and Pearson (1928), Pearson (1962)).
Sir Ronald Fisher (who was also a proponent of eugenics) is credited with developing more
key concepts in the field of inferential statistics than any statistician of the modem era. Many of
Fisher’s ideas were developed during the 14 years he worked at an experimental agricultural
station in Hertfordshire, England called Rothamsted. Like Gosset, most of Fisher’s discoveries
grew out of his need to find solutions to practical problems he was confronted with at work.
Fisher’s most important contributions were his development of the analysis of variance
(described in detail later in the book) and his ideas on the subjects of experimental design and
70 Handbook o f Parametric and Nonparametric Statistical Procedures

hypothesis testing. In the area of hypothesis testing, Fisher introduced the concepts of the null
hypothesis and significance levels. In 1931 and again in 1936 Fisher was a visiting professor at
Iowa State University, and during his tenure in the United States his ideas had a major impact on
the thinking of, among others, George Snedecor and E. F. Lindquist, two prominent American
statisticians. The latter two individuals subsequently published statistics textbooks which were
largely responsible for introducing Fisher’s ideas on hypothesis testing and analysis of variance
to the statistical community in the United States.
By all accounts Fisher was a difficult person to get along with. Throughout their
professional careers Fisher and Karl Pearson were bitter adversaries. In his position of Galton
Professor of Eugenics at University College in London, Karl Pearson was head of the Galton
Laboratory. Among those who worked for or studied under Pearson were his son Egon and Jerzy
Neyman, a Polish statistician. When Karl Pearson retired in 1933 Fisher succeeded him as Galton
Professor o f Eugenics. Since Fisher did not get along with Karl Pearson or his associates, in
order to placate the retiring Pearson the school established a separate department of statistics, and
appointed Egon Pearson as its head. It was Egon Pearson’s collaboration with Jerzy Neyman
which resulted in what came to be known as the Neyman-Pearson hypotheses testing model
(to which Neyman was the major contributor). The latter model became a source of controversy,
since it challenged Fisher’s views on hypothesis testing.
The last part o f this section will focus on the fundamental differences between the Fisher
and Neyman-Pearson models of hypothesis testing. (More comprehensive discussions of the
Fisher/Neyman - Pearson controversy can be found in Cowles (1989,2001), Gigerenzer (1993),
Gigerenzer and Murray (1987), and Lehman (1993).) The beginnings of the classical hypothesis
testing model can be traced to Fisher, who in 1925 introduced null hypothesis testing (also
known as significance testing) (Lehman (1993, p. 1243), Gigerenzer (1993, p. 315)). Fisher
(1925) stated that a null hypothesis should be evaluated in the following manner: Employing
sample data a researcher determines whether or not the information contained in the sample data
deviates enough from the value stated in the null hypothesis to render the null hypothesis
implausible. Fisher did not see any need to have an alternative hypothesis. It was Fisher’s
contention that the purpose of a statistical test was to determine whether or not there was
sufficient evidence to reject the null hypothesis. Within the latter context he introduced the .05
level of significance as the standard level for rejecting a null hypothesis, and suggested the .01
level as a more stringent alternative. This latter convention for assessing statistical significance
was to become an integral part of the classical hypothesis testing model, in spite of the fact that
Fisher revised his viewpoint on this matter in the 1950s (Gigerenzer (1993, p. 316) and Lehman
(1993, p. 1248)). Fisher’s later position was that researchers should publish the exact level of
significance computed for a set of data. In other words, if the likelihood computed for obtaining
a specific result for a set o f data if the null hypothesis is true is .02, instead of stating that the
result is significant at the .05 level (i.e., p < .05), the researcher should just report the exact
probability value (i.e., just state the value p = .02).
Gigerenzer (1993, p. 319) notes that in 1925 Fisher stated that a null hypothesis can be
disproved but “never proved or established” (Fisher (1925, p. 16)). At another point Fisher
(1925, p. 13) stated that “experimenters ... are prepared to ignore all [nonsignificant] results.”
Gigerenzer (1993, p. 319) suggests that within the scientific community there were many who
interpreted the latter statements to mean that a nonsignficant result was worthless, and thus not
worthy of publication — an interpretation which resulted in editors of academic journals
rejecting virtually all studies which failed to report a significant result. In 1955 Fisher changed
his perspective with respect to what the status of the null hypothesis should be if an experiment
yielded a nonsignificant result. At this latter point in time Fisher (1955 p. 73) implied that a
nonsignificant result could not establish the truth of a null hypothesis, but would merely make
Introduction 71

it more likely that the null hypothesis was true. Thus Fisher (1955) stated that acceptance of a
null hypothesis did not indicate that it was proven and thus should be adopted. In his latest
thinking Fisher (1955) took the position that any conclusions reached within the framework of
hypothesis testing should be tentative and subject to reevaluation based on the outcome of future
research. Fisher thus rejected the use of a mechanized set of rules which obligated the researcher
to make a final decision regarding the status of the null hypothesis based on the analysis of a
single experiment.
It was Fisher’s contention that if a null hypothesis was rejected it did not mean that the
researcher should adopt an alternative hypothesis (since, as noted previously, Fisher did not
believe that an alternative hypothesis should be employed). Fisher viewed the probability value
computed for an experiment entirely within the context of that particular study, and did not view
it as having any relevance to potential future replications of the experiment. In contrast, Neyman
(Neyman (1950), Neyman and Pearson (1928)) believed that a probability value computed for
a single experiment should be viewed within the context of potential future replications of the
experiment. (According to Gigerenzer (1993, p. 317) Egon Pearson took a position somewhere
in between that of Fisher and Neyman on this issue.) Fisher did not speak of or acknowledge
Type I or Type II error rates (since he viewed everything within the context of a single
experiment), while, as will be noted shortly, Neyman and Pearson did. In the final analysis, the
two main elements Fisher presented which were ultimately integrated into the classical
hypothesis testing model were: a) The statement of a null hypothesis by a researcher; and b) The
convention of employing the probability values .05 and .01 as the minimum standards for
declaring statistical significance.
In the late 1920s and early 1930s Neyman and Pearson put forth the argument that a
researcher must not only state a null hypothesis, but should also specify one or more alternative
hypotheses against which the null hypothesis could be evaluated. Within this context they
introduced the concepts of a Type I error (i.e., alpha value), a Type II error (i.e., beta value),
and the power of a statistical test. Fisher, however, refused to integrate the latter concepts into
his hypothesis testing model. Neyman and Pearson also rejected the use o f a standard level of
significance such as the .05 or .01 levels, since they felt it was essential that a researcher achieve
a balance between the Type I and Type II error rates. Neyman and Pearson took the position that
a researcher should stipulate the specific level of significance one has decided to employ prior
to conducting a study, and use that value as the criterion for retaining or rejecting the null
hypothesis. On the other hand, as noted earlier, Fisher’s (1955) final position on the latter was
it was not necessary to stipulate the level of significance beforehand, and that if the result of an
experiment was deemed significant, the researcher should report the exact probability value
computed for the outcome.
The final hypothesis testing model which resulted from integrating Fisher’s views on
hypothesis testing with those ofNeyman and Pearson constitutes the classical hypothesis testing
model (i.e., the model described in the previous section on hypothesis testing). As noted in the
introductory material on hypothesis testing, the individuals largely responsible for this
hybridization of Fisher and Neyman -Pearson were authors of textbooks on inferential statistics
during the 1950s and 1960s. Critics of the classical hypothesis testing model view the blending
of the Fisher and Neyman-Pearson models as an unfortunate and ill-conceived hybridization of
two incompatible viewpoints. Some sources (e.g., Gigerenzer (1993) and Gigerenzer and Murray
(1987)) argue that the classical hypothesis testing model has institutionalized inferential
statistics to the extent that researchers have come to view statistical analysis in a dogmatic and
mechanized way. By the latter it is meant that the model instructs researchers to employ a
standardized protocol which involves an inflexible set of decision making guidelines for
evaluating research, and in doing so it neglects to teach researchers to think intelligently about
72 Handbook o f Parametric and Nonparametric Statistical Procedures

data analysis. Gigerenzer (1993, p. 321) notes that neither Fisher (1955) nor Neyman -Pearson
would have endorsed the dogmatic and mechanized approach which characterizes the classical
hypothesis testing model.
Among others, Gigerenzer (1993), Harlow et al. (1997), Hunter and Schmidt (1990,2004),
Kline (2004), Meehl (1967), Rozeboom (1960), Smithson (2003), and Thompson (1993,1999,
2002) propose that researchers employ alternatives to the classical hypothesis testing model,
and that, regardless of which alternative one elects to employ, a researcher should not be
dogmatic and inflexible in evaluating data. Put simply, researchers should pick and choose from
all available methodologies, and select those which are most useful in addressing the problem at
hand. Instead of focusing on probability values and viewing them as the sole criterion for
drawing conclusions, researchers should employ other criteria in evaluating data. Among the
alternatives recommended by Gigerenzer (1993, p. 332) for both evaluating hypotheses and
constructing theories are visual analysis of data and analysis of effect size (which is alluded to
throughout this book). Gigerenzer (1993, p. 335) concludes his critique of the classical
hypothesis testing model by stating that, “Statistical reasoning is an art and so demands both
mathematical knowledge and informed judgement. When it is mechanized, as with the insti­
tutionalized hybrid logic, it becomes ritual, not reasoning.” (The term hybrid logic refers to the
use of the classical hypothesis testing model.) Further discussion of criticism of the classical
hypothesis testing model can be found in the chapter on meta-analysis (Test 43)
With respect to what has been noted above, Anderson (2001) and Everitt (2001) state that
one reason why some people are critical of the classical hypothesis testing model is that the
probability value employed to summarize the result of a test of significance is poorly understood.
As noted earlier, the latter probability level refers to the alpha level employed by a researcher (or,
in some cases, a published probability value represents the exact probability associated with the
outcome of a specific study). Everitt (2001, p. 4) cites a study by Oakes (1986) in which one or
more o f 70 academic psychologists (who ostensibly were trained in the use of the classical
hypothesis testing model) stated that one or more of the following statements are true if the
result of an analysis comparing the means of two independent groups with a t test for two
independent samples is significant at the .01 level (the percentage of academicians endorsing
each statement is noted in parentheses): a) You have absolutely disproved the null hypothesis that
there is no difference between the population means (1%); b) You have found the probability of
the null hypothesis being true (35.7%); c) You have absolutely proved your experimen­
tal/alternative hypothesis (5.7%); d) You can deduce the probability of the experimental
hypothesis/alternative hypothesis being true (65.7%); e) You know, if you decided to reject the
null hypothesis, the probability that you are making the wrong decision (85.7%); f) You have a
reliable experiment in the sense that if, hypothetically, the experiment were repeated a great
number o f times, you would obtain a significant result on 99% of occasions (60%). Only 4.3%
of the academicians endorsed the following correct interpretation of the probability value: If the
null hypothesis is true, the probability of obtaining the data (or data that represent a more extreme
departure from the null hypothesis) is represented by the .01 probability value. Anderson (2001,
p. 49) notes that one might also be tempted to erroneously conclude that the value (1 - p ) (which
in this case equals 1 - .01 = .99) is the probability that the null hypothesis is false. In point of
fact, the latter statement is not justified. In order to determine the likelihood that the null
hypothesis is false one would be required to have access to a sampling distribution which is based
on the assumption that the null hypothesis is false. Such a sampling distribution would reflect the
presence of an effect size, the magnitude of which would be specified by the researcher (for
further clarification see Anderson (2001, p. 44)).23
The position of this book is that although the criticisms directed at the classical hypothesis
testing model certainly have some degree of validity, when used intelligently the model is
Introduction 73

extremely useful, and to date it has been extremely productive in generating scientific knowledge.
Put simply, if those who employ the model lack a clear conceptual understanding of it and/or use
it in an inflexible mechanical manner, it can be employed inappropriately and lead to erroneous
conclusions.24 On the other hand, when used judiciously it can be useful and productive. This
writer would certainly agree that there are situations where the classical hypothesis testing
model may not always be most appropriate for evaluating data. In such instances one should be
amenable to employing an alternative model if it offers one a higher likelihood of discovering the
truth. In the final analysis, however, it is the opinion of the writer that the present scope of
scientific knowledge would not be greater than it is now if during the past 100 years any of the
alternative hypothesis testing models which have been suggested had been used in place of the
classical hypothesis testing model.
Among the alternative models which are available for hypothesis testing are two that are
discussed later in the book. One alternative is the minimum-effect hypothesis testing model,
which is described in Section VII of Test 43 on meta-analysis. The crux of the argument put
forth by proponents o f the minimum-effect hypothesis testing model against the classical
hypothesis testing model is that, in reality, the null hypothesis is always false. Specifically,
various sources note that the null hypothesis is a point hypothesis, in that it stipulates a precise
value — namely zero — for the difference between the experimental conditions. Thus any
difference, no matter how negligible, will provide sufficient grounds for rejecting the null
hypothesis. It has been pointed out by numerous researchers that the actual difference between
two experimental conditions is probably never exactly equal to zero. Although admittedly a
difference may be close to zero, if our measuring instrument is sufficiently sensitive and we carry
our measurements out to many decimal places, we will probably never record a difference which
is exactly equal to zero. And if the latter is true, it means that the null hypothesis will always be
false. If, in fact, the null hypothesis is always false, it logically follows that it is not possible to
commit a Type I error (which is rejecting a true null hypothesis).
The minimum-effect hypothesis testing model employs a null hypothesis which stipulates
a value below which any effect present in the data would be viewed as trivial, and above which
would be meaningful. As an example, if one were comparing the IQ scores of two groups, the
null hypothesis might stipulate a difference between 0 and 5 points, while the alternative hypoth­
esis would stipulate a difference greater than five points. In such a case, any difference of five
points or less would result in retaining the null hypothesis, since a difference within that range
would be considered trivial (i.e., of no practical or theoretical value). A difference of more than
five points would lead to rejection of the null hypothesis, since a difference equal to or greater
than five points would be considered meaningful. Note that the null hypothesis in the minimum-
effect hypothesis testing model stipulates a range of values, whereas in the classical hypothesis
testing model the null hypothesis stipulates a specific value.
A philosophy similar to that underlying the minimum-effect hypothesis testing model is
also associated with the use of tests of equivalence. In contrast to the classical hypothesis
testing model (where the alternative hypothesis states that there is a difference between
treatments), in a test of equivalence the alternative hypothesis states that the treatments are, in
fact, equivalent. Conversely, in a test of equivalence the null hypothesis states that a difference
exists between the treatments. Since it is not mathematically feasible to establish an alternative
hypothesis which states exact equality (i.e., a difference of zero) between experimental
conditions, when one conducts a test of equivalence, prior to conducting a study a researcher
stipulates a value which reflects a maximum difference that will be tolerated between two
treatments in order that the researcher might conclude that the treatments are equivalent to one
another. Any difference equal to or less than the stipulated value would be viewed as so small
as to be inconsequential. Thus, in a test of equivalence if a difference which is equal to or less
74 Handbook o f Parametric and Nonparametric Statistical Procedures

than the value stipulated by the researcher is detected, the null hypothesis (stating that a
difference does exist) can be rejected, and the alternative hypothesis (which stipulates
equivalence) can be accepted. T ests of equivalence, which are illustrated in a number of chapters
of the book, are discussed in detail in Section VII of the t test for two independent samples.
Another approach to hypothesis testing is the Bayesian hypothesis testing model (which
Lehman (1993) notes was rejected by Fisher, Neyman, and Egon Pearson). The latter model
derives from the work of the Reverend Thomas Bayes (1702 - 1761), an eighteenth century
English clergyman who stated a general rule for computing conditional probabilities referred
to as Bayes’ theorem. A conditional probability is the probability that an event will occur,
given the fact that it is already known that another event has occurred. (Bayes’ theorem and the
concept of conditional probability are discussed in greater detail later in this chapter.) The
probability value computed within the framework of the classical hypothesis testing model is
the conditional probability o f obtaining a specific outcome in an experiment, given that the null
hypothesis is true. In the Bayesian hypothesis testing model the probability value computed is
the conditional probability that the null hypothesis is true, given the specific outcome in an
experiment (see Endnote 23 for further clarification). Although semantically the conditional
probabilities computed for the two models sound similar, they are not the same. A detailed
discussion o f the Bayesian hypothesis testing model can be found in Section IX (the
Addendum) of the binomial sign test for a single sample.
At a later point in the book a more recent trend in statistical analysis will be addressed —
specifically, the evolution of methodologies for generating statistical models. Related to the
latter is a recent article by Rodgers (2010) who notes that during the past 20 years a quiet
methodological revolution has occurred in many academic disciplines in which there has been
a shift away from employing a set of mechanistic procedures in data analysis (such as that which
often characterizes the application of the NHST model) and moved toward building and
evaluating statistical and scientific models. Two commonly employed modeling procedures that
are increasingly utilized by researchers in multiple scientific disciplines are path analysis (Test
41) and structural equation modeling (Test 42). The popularity of such procedures is
predicated on researchers’ desire to identify and measure the nature of the complex relationships
between three or more variables, and within that process obtain insight into issues relating to
cause and effect. Although methodologies for developing statistical models have become quite
popular during the past two decades, they also have liabilities associated with them. Specifically:
a) Because of their statistical complexity, modeling procedures are increasingly employed by
researchers who have only limited knowledge of the mathematical rationale underlying such
methodologies. Because of the latter, it is not unusual that such procedures also fall prey to
mechanistic use by the unsophisticated researcher; and b) Modeling procedures may promise
more than they deliver in terms of the actual knowledge they yield. The latter is the case since
such procedures are correlational in nature, and as such do not allow one to draw unequivocal
conclusions regarding cause and effect. Conclusions drawn from the use of modeling procedures
must always be viewed with caution, especially when such conclusions were arrived at by a
researcher with limited statistical literacy.

Estimation in Inferential Statistics

In addition to hypothesis testing, inferential statistics can also be employed for estimating the
value of one or more population parameters. Within this framework there are two types of esti­
mation. Point estimation (which is the less commonly employed of the two methods) involves
estimating the value of a parameter from the computed value of a statistic. The more commonly
employed method of estimation is interval estimation (commonly employed within the context
Introduction 75

of the classical hypothesis testing model as well as the Bayesian hypothesis testing model),
which involves computing a range of values which a researcher can state with a high degree of
confidence contains the true value of a population parameter. One such commonly computed
range of values is referred to as a confidence interval (which was introduced by Jerzy Neyman
(1941)). Oakes (1986) notes that whereas a significance test provides information with respect
to what a population parameter is not, a confidence interval provides information with respect
to what a population parameter is. The most commonly computed confidence intervals are the
95% and 99% intervals.
To illustrate the latter, a 95% confidence interval identifies a range of values a researcher
can be 95% confident contains the true value of a population parameter (e.g., a population mean).
Stated in probabilistic terms, the researcher can state there is a probability/likelihood of .95 that
the confidence interval contains the true value of the population parameter. When the result of
an inferential statistical test is statistically significant at the .05 level, the 95% confidence interval
will not include the hypothesized value of the parameter stipulated in the null hypothesis.
Another example of a confidence interval is a range of values a researcher can be confident to
a specified degree contains the true difference between two population parameters. Thus, a 99%
confidence interval for the difference between two means stipulates the range of values a
researcher can be 99% confident contains the true difference between the means of the two
populations. When the result of an inferential statistical test is statistically significant at the .01
level, the 99% confidence interval will not include the hypothesized value of the difference
between the two population means stipulated in the null hypothesis
During the past 25 years critics of the classical hypothesis testing model (e.g., Altman et
al. (2002), Harlow etal. (1997), Hunter and Schmidt (1990,2004), Kline (2004), Meehl (1967),
Rozeboom (1960), Smithson (2003), and Thompson (1993, 1999, 2002)) have argued that
researchers should utilize confidence intervals in decision making, as well as the fact that all
published research should include confidence intervals for any relevant statistics. In some
instances, those who vigorously advocate the use of confidence intervals argue that the test of
statistical significance should no longer be employed for hypothesis testing, and that instead
decisions should be made on the basis of values computed for confidence intervals. The latter
issue is examined more closely in the discussion of confidence intervals in Section VII of the
single-sample t test.
Another measure which is often estimated within the framework of an experiment is effect
size (also referred to as magnitude of treatment effect). A commonly employed measure of
effect size is a value which represents the proportion or percentage of variability on a dependent
variable that can be attributed to variation on the independent variable (the terms dependent
variable and independent variable are defined in the next section). Throughout this book the
concept of effect size is discussed, and numerous measures of effect size are presented. At the
present time researchers are not in agreement with regard to the role that measures of effect size
should be accorded in summarizing the results of research. As noted in the previous section, an
increasing number of individuals have become highly critical of the classical hypothesis testing
model because of its dependence on the concept of statistical significance. These individuals
have argued that a measure of effect size computed for a study is more meaningful than whether
or not an inferential statistical test yields a statistically significant result. The controversy
surrounding effect size versus statistical significance is discussed in detail in Test 43 on meta­
analysis. In addition, it is also addressed within the framework of the discussion of measures of
effect size throughout the book (e.g., in Section VI of both the t test for two independent
samples and the single-factor between-subjects analysis of variance (Test 21)).
76 Handbook o f Parametric and Nonparametric Statistical Procedures

Relevant Concepts, Issues, and Terminology in Conducting Research


General overview of research methods The three most common strategies employed in
conducting research in the natural and social sciences are: a) The observational method; b) The
experimental method; c) The correlational method.
a) The observational method (also referred to as the case study, clinical, and anecdotal
method) involves accumulating information about the phenomenon under study (e.g., the
behavior/activity of organisms, inanimate objects, events, etc.) by observing the phenomenon in
the real world. Because of its emphasis on observation in the natural environment, this method
is sometimes referred to as naturalistic observation. Since it is informal and subjective in
nature, the observational method is often criticized as being unscientific. It is depicted as a
methodology which sacrifices precision for relevance. Specifically, it opts for relevance, which
means that it studies real world phenomena, as opposed to studying them under artificial
conditions that are often created within the context of laboratory experimentation. The
observational method lacks precision, which is comprised of the following two elements:
quantification and control. Specifically, more often that not, the observational method does not
translate the information it accumulates into quantitative data that can be subjected to statistical
analysis. In addition, it does not introduce adequate control into the situations it observes, and
because of the latter it cannot clearly identify cause and effect with respect to the phenomenon
under study.
b) The experimental method (commonly referred to as the scientific method) accumulates
information by conducting controlled experiments. It is also referred to as the bivariate method,
since in its simplest form it involves two variables, the independent variable and the dependent
variable (which will be clarified later in this discussion). The experimental method is formal and
objective in nature, and depends on the statistical analysis of data to draw conclusions regarding
the phenomenon under study. The experimental method is depicted as sacrificing relevance for
precision. Specifically, it opts to sacrifice the reality of everyday life (relevance) and instead
studies organisms/events in controlled settings (such as a laboratory) in order that it might
determine whether or not there is a cause-effect relationship between two variables. The
experimental method is often criticized by proponents of the observational method on the
grounds that it studies behavior/events in an artificial setting, and information obtained in such
an environment may not conform to what actually occurs in the real world. Because of the latter,
critics of the experimental method claim that it has poor external validity — the external
validity of an experiment refers to the degree to which its results can be generalized beyond the
sample employed in the study as well as the setting in which it is conducted.25
c) The correlational method is employed to determine whether or not two or more
variables are statistically related to one another. If, in fact, two variables are statistically related
to one another, the correlational method allows a researcher to predict at above chance a score
on one variable through use of the score on the second variable. For example, the correlational
method could be employed to predict how a person will do on a job on the basis of a person’s
score on a personality test. Although the correlational method provides a blend of both relevance
and precision, it lacks the degree of relevance associated with the observational method, as well
as the fact that it lacks the degree of precision associated with the experimental method. With
respect to the latter, although the correlational method is characterized by quantification, it lacks
the control associated with the experimental method. Data that are evaluated through use of the
correlational method can be obtained in either real life or controlled experimental environments.
When the correlational method is employed with more than two variables, it is commonly
discussed within the context of the multivariate method of research (which is discussed in the
latter section of the book).
Introduction 77

Hypothetical constructs and operational definitions In simplest terms, a hypothetical


construct (often just referred to as a construct) can be viewed as any theoretical concept. The
term hypothetical construct is employed in various disciplines to represent something which is
hypothesized to exist, although strictly speaking, whatever it is that one is referring to can never
be directly observed. As a result of not being directly observable, a hypothetical construct must
be measured indirectly.
To illustrate, intelligence is an example of a hypothetical construct, in that strictly speaking
we never directly observe intelligence. We only observe behavior (such as how a person does on
an IQ test) which suggests that a person is intelligent to some degree. Perhaps some day we will
be able to represent intelligence as a physical reality — in other words, a specific structure or
chemical in the brain, which by virtue of its size or concentration will actually represent a person’s
exact level of intelligence. Until such time, however, intelligence is just a convenient term
psychologists employ to reflect individual differences with respect to certain behavior. In spite
of the fact that a hypothetical construct such as intelligence can never be directly observed, it is
something that scientists such as psychologists often want to measure. The latter can also be
expressed by saying that scientists want to empirically translate or operationalize the
hypothetical construct of intelligence. The term operational definition (which derives from the
term operationalize) represents the specific operations/measures which are employed by a
researcher to measure a hypothetical construct. Put simply, an operational definition is a way
of measuring a hypothetical construct.
The most common way of measuring the construct of intelligence is the performance of a
person on an IQ test. Thus, an IQ test score is an operational definition of the concept of
intelligence. Yet there are many other ways one might measure intelligence — for example, how
well one does in school or how well one does in solving certain types of problems. In the final
analysis, since there is no perfect and direct way of measuring intelligence, any measure of it can
be challenged. This will also be the case with respect to any other hypothetical construct which
one elects to investigate. Other examples ofhypothetical constructs which are frequently the focus
of research are traits (such as extroversion, anxiety, conscientiousness, neuroticism, etc.),
emotional states (such as anxiety, depression, etc.), a person’s status with respect to such things
as health or social class, the status of the economy, etc. In addition, conditions which a person (as
well as other organisms and inanimate objects) can be exposed to in the environment (such as
frustration, boredom, stress, etc.) can also represent hypothetical constructs. Like intelligence,
each of the latter environmental conditions is never directly observable.We only observe
circumstances and/or behavior which suggests the presence or absence of such conditions to
varying degrees.
As noted above, one or more ways can be devised to measure a hypothetical construct. As
another example, a trait such as neuroticism, which represents a hypothetical construct, can be
measured with pencil and paper personality tests, projective tests, physiological measures, peer
or expert ratings, or through direct observation of specific target behaviors which are considered
critical indicators for the presence of that trait. In the same respect, one or more ways can be
devised to create certain conditions in the environment such as frustration, boredom, stress, etc.
To illustrate, the following are a variety of mechanisms a researcher might employ to represent
the construct of environmental stress: a) Exposure to a noxious stimulus such as loud noise; b)
Threat of physical assault; c) Presenting a person with a task which he or she does not have time
to complete. As is the case with intelligence, any of the aforementioned measures or conditions
(i.e., operational definitions) representing the constructs of neuroticism and stress may be
challenged by someone else who argues that the selected measure or condition is not the optimal
way of evaluating the construct in question. Nevertheless, in spite of the fact that one can
challenge virtually any operational definition a researcher employs to represent a hypothetical
78 Handbook o f Parametric and Nonparametric Statistical Procedures

construct, in order to conduct research it is necessary to operationally define hypothetical


constructs. Thus, regardless of whether a researcher employs the observational, experimental, or
correlational methods of research, one will ultimately have to operationalize one or more
constructs that are being evaluated.

Relevant terminology for the experimental method Although the terminology presented
in this section will be illustrated through the use of research involving human subjects, it should
be emphasized that the basic definitions to be discussed apply to all varieties of experiments (i.e.,
experiments involving nonhuman subjects, inanimate objects, events, etc.). The typical experiment
evaluates one or more hypotheses. Within the context of conducting an experiment the most
elementary hypothesis evaluated is a formal prediction about the relationship between two
variables — specifically, an independent variable and a dependent variable. As an example
of such a hypothesis, consider the statementfrustration causes aggression. The latter statement
is a prediction regarding the relationship between the two variables frustration and aggression.
In order to test the latter hypothesis, let us assume that an experimenter designs a study involving
two groups of subjects who are told they will be given an intelligence test. One group (Group 1
— which represents the experimental group) is frustrated by being given a very difficult test on
which it is impossible for a person to achieve a high score. The other group (Group 2 — which
represents the control group) is not frustrated, since it is given a very easy test. Note that the
group which receives the main treatment (which in this case is the group that is frustrated) is
commonly referred to as the experimental group. The group which does not receive the main
treatment (which in this case is the group that is not frustrated) is commonly referred to as the
control group or comparison group. Upon completing the test, all subjects are asked to play a
video game in which a subject has the option of killing figures depicted on a computer screen.
The number of figures a subject kills is employed as the measure (i.e., operational definition) of
aggression in the study. The experimenter predicts that subjects in the experimental group, by
virtue of being frustrated, will record more kills than subjects in the control group. Note that the
manner in which the researcher defines frustration can certainly be challenged — specifically,
most people can probably think of what they consider a better way to frustrate subjects than by
giving them a difficult test. In the same respect one can also challenge the criterion the
experimenter employs to measure aggression. However, before being overly critical of the
experiment, one must realize that ethical and pragmatic considerations often limit the options
available to a researcher in designing an experiment (especially one involving variables such as
frustration and aggression).
In the above described experiment there are two variables, an independent variable
(referred to in some sources as an exogenous variable) and a dependent variable (referred to
in some sources as an endogenous variable). The independent variable is the experimental
conditions or treatments — in other words, whatever it is which distinguishes the groups from
one another. In the experiment under discussion the independent variable is the frustration factor,
since one group was frustrated and the other group was not. The number of groups represents the
number of levels of the independent variable. Since in our experiment there are two groups, the
independent variable is comprised of two levels. If a third group had been included (which could
have been given a moderately difficult test, resulting in moderate frustration), the independent
variable would have had three levels. The dependent variable in an experiment is the measure
that is hypothesized to depend on the level of the independent variable to which a subject is
exposed. Thus, in the experiment under discussion, the dependent variable is aggression. If the
hypothesis is correct, the scores of subjects on the dependent variable will be a function of which
level of the independent variable they served (i.e., of which group they were a member). Thus,
Introduction 79

it is predicted that subjects in the frustration group (Group 1) will record more kills than subjects
in the no frustration group (Group 2).
The simplest way to determine the independent variable in an experiment is to ask oneself
on what basis the groups are distinguished from one another. Whatever the distinguishing feature
is between the groups represents the independent variable. As noted above, the distinguishing
feature between the groups in the experiment under discussion is the degree of frustration to which
subjects are exposed, and thus frustration represents the independent variable. In the anti­
depressant study discussed earlier in this chapter, the independent variable was whether or not a
subject received an antidepressant drug or the placebo (since the latter was the distinguishing
feature between the two groups in the aforementioned study).
The simplest way to determine the dependent variable in an experiment is to ask oneself
what the scores represent that the subjects produce at the conclusion of the experiment (i.e., the
scores which are employed to compare the groups with one another). As noted above, the level
of aggression in the two groups is contrasted with one another, and thus aggression represents the
dependent variable. In the antidepressant study discussed earlier, the dependent variable was the
depression scores of subjects at the conclusion ofthe study (since the latter represents the data that
were employed to compare the two groups). Although it is possible to have more than one
independent and/or dependent variable in an experiment, in this discussion we will only concern
ourselves with experiments in which there is a single independent variable and one dependent
variable.
Within the framework of the experimental method a distinction is commonly made between
a true experiment and a natural experiment (which is also referred to as an ex post facto
study). This distinction is predicated on the fact that in a true experiment the following applies:
a) In a true experiment subjects are randomly assigned to a group, which means that each
subject has an equal likelihood of being assigned to any of the groups employed in the
experiment. Random assignment is assumed to result in groups which are likely to be equivalent
to one another. It should be noted, however, that although random assignment does not guarantee
equivalency of groups, it optimizes the likelihood of achieving that goal. For example, if an
experiment employs 100 subjects, half of whom are men and half of whom are women, random
assignment of subjects to groups will most likely result in approximately an equal number of men
and women in both groups (as well as be likely to make the groups comparable with respect to
other relevant demographic variables such as age, socioeconomic status, etc.); b) The independent
variable in a true experiment is manipulated by the experimenter. The frustration-aggression
study under discussion illustrates an example of an experiment which employs a manipulated
independent variable, since in that study the experimenter manipulates each of the groups. In
other words, the level of frustration created within each group is determined/manipulated by the
experimenter.
In a natural experiment random assignment of subjects to groups is impossible, since the
independent variable is not manipulated by the experimenter, but instead is some preexisting
subject characteristic (such as gender, race, etc.). A nonmanipulated independent variable
in a natural experiment is often referred to as a subject variable, attribute variable, or
organismic variable (since the differentiating feature between groups is some preexisting
attribute of the subjects/organisms employed in the experiment). As an example, if we compare
the overall health of two groups, smokers and nonsmokers, the independent variable in such a
study is whether or not a person smokes, which is something that is determined by nature prior
to the experiment. The dependent variable in such an experiment will be a measure of the
subjects’ health. (Readers should be aware of the fact that some sources limit the use of the term
independent variable to a manipulated independent variable, and do not classify a nonmanipulated
independent variable as an actual independent variable.)
80 Handbook o f Parametric and Nonparametric Statistical Procedures

The advantage of a true experiment over a natural experiment is that the true experiment
allows a researcher to exercise much greater control over the experimental situation. Since in the
true experiment the experimenter randomly assigns subjects to groups, it is assumed that the
groups are equivalent to one another, and as a result of this any differences between the groups
with respect to the dependent variable can be directly attributed to the manipulated independent
variable. The end result of the latter is that the true experiment allows a researcher to draw
conclusions with regard to cause and effect.
The natural experiment, on the other hand, does not allow one to draw conclusions with
regard to cause and effect. Essentially the type of information which results from a natural
experiment is correlational in nature. Such experiments can only tell a researcher that a
statistical association exists between the independent and dependent variables. The reason why
natural experiments do not allow a researcher to draw conclusions with regard to cause and
effect is that such experiments do not control for the potential effects of confounding variables
(also known as extraneous variables). (Sometimes the term artifact is employed for any aspect
of an experimental design which may go unnoticed and inadvertently produce a confound.) A
confounding variable is any variable which systematically varies with the different levels of the
independent variable. To illustrate, assume that in a study comparing the overall health of smokers
and nonsmokers, unbeknownst to the researcher all of the smokers in the study are people who
have high stress jobs and all the nonsmokers are people with low stress jobs. If the outcome of
such a study indicates that smokers are in poorer health than nonsmokers, the researcher will have
no way of knowing whether the inferior health of smokers is due to smoking and/or job stress, or
even to some other confounding variable of which he is unaware. In the case of the true
experiment, on the other hand, confounding is much less likely as a result of randomly assigning
subjects to the experimental conditions.
If an experiment is well controlled so as to rule out any confounding variables, the
experiment is said to have internal validity. In other words, if a significant difference is found
between the groups with respect to the dependent variable in a well designed true experiment,
the experimenter can have a high degree of confidence that most likely the difference is
attributable to the independent variable, and is not the result of some confounding variable.
Earlier in this chapter it was noted that at the conclusion of an experiment the scores of
subjects in the different groups are compared with one another through use of an inferential
statistical test. In reference to the frustration-aggression experiment alluded to earlier, the purpose
of such a test would be to determine whether or not there is a statistically significant difference
between the aggression scores of the two groups. To put it another way, by employing a statistical
test the experimenter will be able to say whether or not subjects exposed to different levels of the
independent variable exhibited a difference with respect to their scores on the dependent variable.
In the event a statistically significant difference is obtained between the two groups, the
experimenter can conclude that it is highly unlikely that the difference is due to chance. In view
of the latter, the most logical decision would be to conclude that the difference in aggression
scores between the two groups was most likely due to whether or not subjects were frustrated (i.e.,
the independent variable).
Before closing this discussion of the experimental method, it should be noted that it is
possible to have more than one independent variable in an experiment. Experimental designs
which involve more than one independent variable are referred to as factorial designs. In such
experiments, the number of independent variables will correspond to the number of factors in the
experiment, and each independent variable/factor will be comprised of two or more levels. It is
also possible to have more than two dependent variables in an experiment. Typically, experiments
involving two or more dependent variables (and often simultaneously two or more independent
Introduction 81

variables) are evaluated with multivariate statistical procedures, some of which are discussed
later in the book.
Correlational Research The discussion of correlational research in this section will be
limited to the simplest type of correlation — specifically, bivariate correlation, which is the use
of correlation to measure the degree of association between two variables. It should be noted that
correlational procedures can also be employed with more than two variables. The latter type of
correlation is commonly discussed within the framework of multivariate analysis (e.g., multiple
regression (Test 33)).
In the simplest type o f correlational study, scores on two measures/variables are available
for a group of subjects. The major goal of correlational research is to determine the degree of
association between two variables, or to put it another way, the extent to which a subject’s score
on one variable can be predicted if one knows the subject’s score on the second variable. Usually
the variable that is employed to predict scores on a second variable is designated as the X
variable, and is referred to as the predictor variable. The variable which is predicted from the
X variable is usually designated as the Y variable, and is referred to as the criterion variable.
(Although the terms independent and dependent variable are typically limited to the variables
employed in studies based on the experimental method, some sources refer to the predictor
variable in correlational research as the independent variable and the criterion variable as the
dependent variable.)
The most commonly employed correlational measure is the Pearson product-moment
correlation coefficient (Test 28), which is represented by the notation r. The latter value is
computed by employing the scores of subjects in an algebraic equation. The value computed for
r (i.e., the correlation coefficient) can fall anywhere within the range - 1 to +1. Thus, the value
of r can never be less than -1 (i.e., r cannot equal -1.2, -50, etc.) or be greater than +1 (i.e., r
cannot equal 1.2,50, etc.). The absolute value of r (represented with the notation |r \) indicates
the strength of the relationship between the two variables. Recollect that the absolute value of
a number is the value of the number irrespective of the sign. Thus, in the case of the two
correlation coefficients r = +1 and r= ­ 1, the absolute value of both coefficients is 1. As the
absolute value of r approaches 1, the degree of relationship (to be more precise, the degree of
linear relationship) between the variables becomes stronger, achieving the maximum when |r|
= 1 (i.e., when r equals either +1 or ­ 1). The closer the absolute value of r is to 1, the more
accurately a researcher will be able to predict a subject’s score on one variable from the subject’s
score on the other variable. The closer the absolute value of r is to 0, the weaker the linear
relationship between the two variables. As the absolute value of r approaches 0, the degree of
accuracy with which a researcher can predict a subject’s score on one variable from the other
variable decreases, until finally, when r - 0 there is no predictive relationship between the two
variables. To state it another way, when r = 0 the use of the correlation coefficient to predict a
subject’s Y score from the subject’s X score (or vice versa) will not be any more accurate than
a prediction that is based on some random process (i.e., a prediction that is based purely on
chance).
The sign of r indicates the direction or nature of the relationship that exists between the
two variables. A positive sign indicates a direct relationship, whereas a negative sign indicates
an indirect (or inverse) relationship. When there is a direct relationship, subjects who have a
high score on one variable will have a high score on the other variable, and subjects who have
a low score on one variable will have a low score on the other variable. The closer a positive
value of r is to +1, the stronger the direct relationship between the two variables, whereas the
closer a positive value of r is to 0, the weaker the direct relationship between the variables. Some
general guidelines which can be employed are that r values between +.70 and +1 represent
examples o f a strong direct relationship; r values between +.30 and +.69 represent examples
82 Handbook o f Parametric and Nonparametric Statistical Procedures

of a moderate direct relationship; and r values between +.01 and +.29 represent examples of
a weak direct relationship. When r is close to +1, most subjects who have a high score on one
variable will have a comparably high score on the second variable, and most subjects who have
a low score on one variable will have a comparably low score on the second variable. As the
value of r approaches 0, the consistency of the general pattern described by a positive correlation
deteriorates, until finally, when r = 0 there will be no consistent pattern that allows one to predict
at above chance a subject’s score on one variable if one knows the subject’s score on the other
variable.
When there is an indirect/inverse relationship, subjects who have a high score on one
variable will have a low score on the other variable, and vice versa. The closer a negative value
of r is to - 1 , the stronger the indirect relationship between the two variables, whereas the closer
a negative value of r is to 0, the weaker the indirect relationship between the variables. Thus,
using the guidelines noted above, r values between -.70 and -1 represent examples of a strong
indirect relationship; r values between -.30 and -.69 represent examples of a moderate
indirect relationship; and r values between -.01 and -.29 represents examples of a weak
indirect relationship. When r is close to - 1, most subjects who have a high score on one
variable will have a comparably low score on the second variable (i.e., as extreme a score in the
opposite direction), and most subjects who have a low score on one variable will have a com ­
parably high score on the second variable. As the value of r approaches 0, the consistency of the
general pattern described by a negative correlation deteriorates, until finally, when r = 0 there
will be no consistent pattern that allows one to predict at above chance a subject’s score on one
variable if one knows the subject’s score on the other variable.
A common error made in interpreting correlation is to assume that a positive correlation is
more meaningful or stronger than a negative correlation. However, as noted earlier, the strength
of a correlation is a function of its absolute value. To illustrate this point, let us assume that the
correlation between the scores on a psychological test, which will be labeled Test A, and how
effective a person will be at a specific job is r = +.56, and the correlation between another
psychological test, Test B, and how effective the person will be on the job is r = ­ .78. O f the two
tests, Test B will predict with a higher degree of accuracy whether or not a person will be
effective on the job. This is the case since the absolute value of the correlation coefficient for
Test B is .78, which is higher than the absolute value for Test A, which is .56. In the case of Test
B, the strong negative correlation indicates that the lower a person’s score on the test, the higher
his or her job performance, and vice versa.
It should be emphasized that correlational information does not allow a researcher to
draw conclusions with regard to cause and effect. Thus, although a substantial correlation may
be observed between two variables X and Y, on the basis of the latter one cannot conclude that
X causes Y or that Y causes X. Although it is possible that X causes Y or that Y causes X, one or
more other variables which have not been taken into account may be involved in causation for
either variable. As an example, if a strong positive correlation (e.g., +.80) is obtained between
how much a person smokes (which will represent the variable) and how many health problems
a person has (which will represent the Y variable), one cannot conclude on the basis of the strong
positive correlation that smoking causes health problems. Although smoking may cause health
problems or vice versa (i.e., health problems may cause a person to smoke), the correlational
method does not provide enough control to allow one to draw such conclusions. As noted in the
discussion of the natural experiment, the latter is true since a correlational analysis does not allow
a researcher to rule out the potential role of confounding variables. For example, as noted in the
discussion of the natural experiment, let us assume that people who smoke have high stress jobs
and that is why they smoke. Their health problems may be caused by the high stress they
experience at work, and are not related to whether or not they smoke. The strong correlation
Introduction 83

between smoking and health problems masks the true cause of health problems, which is having
a high stress job (which represents a confounding variable that is not taken into account in the
correlational analysis). Consequently, a correlation coefficient only indicates the degree of
statistical relationship between two variables, and does not allow a researcher to conclude that
one variable is the cause of the other. It should be noted, however, that the cause-effect
relationship between smoking and health problems has been well documented based on a large
number of studies. Many of these studies employ sophisticated experimental designs which allow
researchers to draw conclusions that go well beyond those which can be reached on the basis of
the simple correlational type of study described in this section. More detailed discussion of the
issue of correlation and causation can be found in the chapters on the Pearson product-moment
correlation coefficient and multiple regression, as well as in many sources which address the
general subject of experimental design (e.g., Shadish et al. (2002), Trochim (2005)).
In the discussion of the experimental method it was emphasized that natural experiments
only provide correlational information. In other words, a significant result in a natural
experiment only indicates that a significant statistical relationship/association is obtained
between the independent variable and the dependent variable, and it does not indicate that scores
of subjects on the dependent variable are caused by the independent variable. In point of fact,
correlational measures can be employed to indicate the degree of association between an
independent variable and a dependent variable in both a natural experiment and a true
experiment. When correlational measures are employed within the latter context, they are
commonly referred to as measures of effect size. Measures of effect size are described
throughout the book within the context of the discussion of various inferential statistical tests.
It should be emphasized that measures of correlation in and ofthemselves are not inferential
statistical tests, but are, instead, descriptive measures, which, as noted above, only indicate the
degree to which two or more variables are related to one another. In actuality, there are a large
number of correlational measures (many of which are described in detail in this book) that have
been developed which are appropriate for use with different kinds of data. Although (as is the
case with the Pearson product-moment correlation coefficient (i.e., r)) the range of values for
many correlational measures is between - 1 and +1, the latter values do not always define the
range of possible values for a correlation coefficient.

Experimental Design

Inferential statistical tests can be employed to evaluate data that are generated from a broad
variety of experimental designs. This section will provide an overview of the general subject of
experimental design, and introduce design related terminology which will be employed
throughout the book. An experimental design is a specific plan which is employed to investigate
a research problem. Winer (1971) draws an analogy between an experimental design and the
design an architect creates for a building. In the latter situation the prospective owner of a
building asks an architect to draw up a set of plans which will meet his requirements with respect
to both cost and utility, as well as the requirements of the individuals who will be using the
building. Two or more architects will often submit different plans for the same building, with
each architect believing that his are best suited to accomplish the goals stipulated by the
prospective owner. In the same respect, two or more experimental designs may be considered for
investigating the same research problem. As is the case with the plans of different architects, two
or more research designs will have unique assets and liabilities associated with them.
Among the criteria researchers employ in deciding among alternative experimental designs
are the following: a) The relative cost of the designs — specifically, relative cost is assessed with
respect to the number of subjects required in order to conduct a study, as well as the amount of
84 Handbook o f Parametric and Nonparametric Statistical Procedures

time and money a study will entail; b) Whether or not a design will yield results which are
reliable, and the extent to which the results have internal and external validity; c) The practicality
of implementing a study employing a specific design. With regard to the latter, it is often the case
that although one design may be best suited to provide the answer to a researcher’s question, it
is not possible to use that design because of practical and/or ethical limitations. For example, in
order to demonstrate unequivocally that a specific germ is the cause of a serious and incurable
disease, it would be necessary to conduct a study in which one group of human subjects is
infected with the disease. Obviously, such a study would be not be sanctioned in a society that
respected the rights and freedom of its members. Consequently, alternative research designs
would have be considered for determining whether or not the germ is the cause of the disease.
This section will provide an overview of three general categories of experimental design
which were described by Campbell and Stanley (1963), and have since been employed by many
sources that discuss the general subject of experimental design. Campbell and Stanley (1963)
identified the following three design categories: a) Pre-experimental designs (also referred to
as faulty experimental designs and nonexperimental designs): b) Quasi-experimental
designs; c) True experimental designs. The basic difference between true experimental
designs versus pre- and quasi-experimental designs is that the internal validity of the latter two
designs is compromised by the fact that subjects are not randomly assigned to experimental
conditions and/or there is a lack of a control group(s).26 Since the latter two factors do not
compromise the internal validity of true experimental designs, such designs are able to isolate
cause and effect with respect to the relationship between an independent and dependent variable.
The three aforementioned design categories will now be described in greater detail. Readers who
require a more detailed exposition of the subject matter on experimental design to be discussed
in the section are referred to sources such as Cook and Campbell (1979) and Shadish et al.
(2002).
Pre-experimental designs Designs which sources categorize as pre-experimental designs
lack internal validity. In other words, such designs do not allow a researcher to conclude
whether or not an independent variable/treatment influences scores on some dependent variable/
response measure. Although in most instances this is because a pre-experimental design lacks
a control group, one pre-experimental design will be described which lacks internal validity
because subj ects are not randomly assigned to the experimental conditions. Campbell and Stanley
(1963) and Cook and Campbell (1979) note that it is necessary to control for the potential effects
of the following variables in order to insure that a research design has internal validity: a)
history; b) maturation; c) instrumentation; d) statistical regression; e) selection; f)
mortality.
History can be defined as events other than the independent variable which occur during
the period of time that elapses between a pretest and a posttest on a dependent variable. To
illustrate, assume that 20 clinically depressed subjects are given a pretest to measure their level
of depression. Following the pretest the subjects are given antidepressant medication, which they
take for six months. After the six months have elapsed a posttest is administered reevaluating the
subjects’ level o f depression. Let us assume that a significant decrease in depression is observed
between the pretest and posttest. A researcher might be tempted to conclude from such a study
that the anti depressant was responsible for the observed decrease in depression. However, the
decrease in depression could be due to some other variable which was simultaneously present
during the period of time between the pretest and the posttest. For example, it is possible that all
of the subjects were in psychotherapy during the six months they were taking medication. If the
latter were true, the decrease in depression could have been due to the psychotherapy (which
represents a confound in the form of a historical variable) and not the medication. As a general
Introduction 85

rule, the longer the period of time that elapses between a pretest and a posttest, the greater the
likelihood that a historical variable may influence scores on the dependent variable.
Maturation refers to internal changes which occur within an organism (i.e., biological and
psychological changes) with the passage of time. Examples of maturational variables are changes
associated with the physical maturation of an organism, an organism developing hunger or thirst
with the passage of time, an organism becoming fatigued or bored with the passage of time, and
an organism becoming experienced at or acclimated to environmental conditions with the passage
of time. To illustrate a maturational variable, assume that 50 one-year-old children who have not
exhibited much if any inclination to walk are given a pretest to measure the latter ability.
Following the pretest the children are put in a physical therapy program for six months. At the
end of the six months a posttest is administered in which the children are reevaluated with respect
to their walking ability. Let us assume that a significant improvement in walking is observed in
the children when the posttest scores are compared with the pretest scores. A researcher might
be tempted to conclude from such a study that the physical therapy was responsible for the
improvement in walking. In point of fact, the improvement could have been due to physical
maturation, and that with or without the physical therapy the children would have exhibited a
significant increase in walking ability.
Instrumentation refers to changes that occur with respect to the measurement of the
dependent/response variable over time. To illustrate, a study is conducted to assess the effect of
exposure to loud music on blood pressure. A pretest measure of blood pressure is obtained for
a group o f subjects prior to them being exposed to one half hour of rock music, after which their
blood pressure is reevaluated. However, unbeknownst to the researcher, the machine employed
to measure blood pressure malfunctions during the posttest phase of the study, yielding
spuriously high readings. The researcher erroneously concludes that the rock music was
responsible for the increase in the subjects’ blood pressure, when in fact it was the result of
instrumentation error. As another example of instrumentation error, assume a study is conducted
in which judges are required to rate subjects with respect to the degree to which they exhibit
competitive behavior before and after exposure to one hour of severe heat (which is hypothesized
to affect competitiveness). Let us also assume that as the study progresses the judges become
bored and/or fatigued, and as a result of the latter their ratings become increasingly inaccurate.
Any differences obtained with respect to the judges’ ratings on the pretest versus the posttest
could be attributed to the unreliable ratings (which represent instrumentation error) and not the
experimental treatment (i.e., the heat).
Two other factors related to instrumentation which can compromise the reliability of a study
are ceiling and floor effects. Both of the latter can occur if the measure employed in evaluating
the underlying construct represented by the dependent variable is not able to adequately cover
the full range of values which a subject can obtain on that variable. To illustrate, a ceiling effect
would be present if the highest possible score a subject can obtain on a measure of anxiety
employed by a researcher, in reality, does not reflect the actual level of anxiety of that subject
relative to other subjects who have been assigned the identical maximum score. On the other
hand, a floor effect would be present if the lowest possible score a subject can obtain, in reality,
does not reflect the actual level of anxiety of that subject relative to other subjects who have also
been assigned the identical minimum score. Among other things, both ceiling and floor effects
will result in underestimating the degree of variability on a dependent variable (since in the case
of a ceiling effect there will be additional undetectable variability among those who have the
highest possible score, and in the case of a floor effect there will be additional undetectable
variability among those who have the lowest possible score).
Statistical regression refers to the fact that a subject yielding an extreme score (i.e., very
high or very low) on a response measure will tend to yield a value that is closer to the mean if a
86 Handbook o f Parametric and Nonparametric Statistical Procedures

second score is obtained on the response measure. As an example, a person may feel terrific on
a given day and because of the latter perform extremely well on a specific task (or conversely,
a person may feel lousy on a given day and thus perform poorly on a task). When retested, such
an individual would be expected to yield a score which regresses toward the mean — in other
words, the subject’s retest score will be closer to his or her typical (i.e., average) score on the
response measure. The regression phenomenon results from the fact that two or more scores for
a subject will never be perfectly correlated with one another. The lack of a perfect correlation can
be attributed to chance and other uncontrolled for variables which are inevitably associated with
measurement. To examine how statistical regression can compromise the internal validity of an
experiment, assume that a group of subjects is administered a pretest on a test of anxiety, after
which they participate in an exercise program which is hypothesized to facilitate relaxation.
Following the exercise program subjects are reevaluated for anxiety (i.e., a posttest is
administered). If, in fact, a disproportionate number of subjects are uncharacteristically anxious
on the day o f the pretest and these subjects obtain substantially lower anxiety scores on the
posttest, the latter could just as easily be due to statistical regression as opposed to the exercise
program. In other words, it would be expected that if the exercise program had no effect on
anxiety, because of statistical regression subjects who obtained uncharacteristically high scores
on the pretest would be likely to yield scores that were more representative o f their typical level
of anxiety on the posttest (i.e., lower scores).
Selection (or selection bias) refers to the fact that the method a researcher employs in
assigning subjects to groups can bias the outcome of a study. The most common situation
resulting in selection bias is when subjects are not randomly assigned to groups. As an example,
a study involving two groups is conducted to assess the efficacy of a new antidepressant drug.
Subjects in the experimental group are given the antidepressant for six months, while subjects
in the control group are given a placebo. In point of fact, subjects in the experimental group are
comprised of patients who are being treated at a private clinic that does not generally accept for
treatment severely depressed individuals. On the other hand, subjects in the control group are all
derived from a clinic at a public hospital which treats many severely depressed patients.
Unaware of the difference between the populations treated by the two facilities, the researcher
conducting the study assumes that the two groups are comparable with respect to their level of
depression, and consequently does not obtain pretest scores for depression. Following the
administration of the experimental treatments the researcher finds that the mean depression score
for the experimental group is significantly less than the mean score of the control group. The
researcher might be tempted to conclude that the antidepressant was responsible for the obtained
difference when, in fact, the difference could be due to selection bias. In other words, if the drug
does not have an effect on depression, subjects who receive it will probably still yield lower
depression scores than subjects who receive the placebo. The latter is the case, since there is an
extremely high likelihood that at the beginning of the study the experimental group was less
depressed than the placebo group.
Mortality (or subject mortality) refers to the differential loss of subjects in one or more
groups that are participating in an experiment. During the course of an experiment one or more
subjects may die, become ill, move to a different geographical locale, or just elect to no longer
participate in the study. When a disparate pattern of subject mortality characterizes the attrition
in two or more initially comparable groups, the internal validity of a study can be compromised.
To illustrate, a study involving two groups (to which the subjects are randomly assigned) is
conducted to assess the efficacy of a new antidepressant drug. It turns out, however, that one-
quarter of the subjects in the experimental group (i.e., the group that receives the drug) drop out
of the study prior to its completion (primarily due to the fact that they cannot tolerate the side
effects of the medication), while only one percent of the control group (i.e., the placebo group)
Introduction 87

eliminate themselves from the study. Unbeknownst to the researcher, the dropouts in the
experimental group happen to be the most depressed subjects in that group. As a result of the
latter, at the conclusion of the study the average depression score obtained for the experimental
group is significantly lower than the average obtained for the control group. Rather than
indicating the efficacy of the antidepressant, the difference between the groups may be
attributable to the fact that the posttest average of the experimental group is based on the scores
of only a portion of the original subjects who comprised that group — specifically, those
members of the group who were characterized by the lowest levels of depression. When
contrasted with the posttest mean of the control group (which as a result of taking the placebo
would only be expected to decrease minimally from its pretest value), the experimental group
should yield a lower average due to selective subject attrition.
Campbell and Stanley (1963) and Cook and Campbell (1979) describe the following three
types of pre-experimental designs, all of which lack internal validity: a) One-shot case study;
b) One-group pretest-posttest design; c) Nonequivalent posttest-only design (originally
called static-group comparison by Campbell and Stanley (1963)).

The one-shot case study is also referred to as the one-group after-only design and the
single-group one-observation design. In this design a treatment is administered to a group of
subjects, and following the treatment the performance/response of the subjects on some
dependent variable is measured. The internal validity of this design is severely compromised
since: a) It lacks a control group; and b) It fails to formally obtain pretest scores for subjects
which could, at least, be compared with their posttest scores. The one-shot case study is
summarized in Figure 1.16.

Time 1 Time 2
Treatment Response measure

Figure 1.16 One-Shot Case Study

As an example of a one-shot case study, let us assume that an ostensibly therapeutic


surgical procedure is performed on a group of arthritis patients at Time 1 and a year later (Time
2) the researcher measures the severity of symptoms reported by the patients. If at Time 2 the
researcher finds that most patients report only minimal symptoms, he might be tempted to
conclude that the latter was the result of the surgical procedure. However, the absence of both
a control group and a pretest make it impossible for the researcher to determine whether or not
the minimal symptoms reported by subjects at Time 2 were due to the treatment or, instead, due
to one or more of the extraneous variables noted earlier in this section. For example, it is possible
that during the year which elapsed between Times 1 and 2 the patients shared some other
common therapeutic experience of which the researcher was unaware — specifically, they might
have all engaged in an exercise program which was designed to minimize symptoms.
Consequently, the latter experience, and not the surgery, could be the reason why at Time 2
patients only reported minimal discomfort. In this instance, history would represent an extraneous
variable for which the researcher did not control.
The one-group pretest-posttest design (which is also referred to as the one-group before-
after design) is an improvement over the one-shot case study in that it includes a pretest.
Nevertheless, its internal validity is still compromised since it lacks a control group. The one-
group pretest-posttest design is summarized in Figure 1.17.
88 Handbook o f Parametric and Nonparametric Statistical Procedures

Time 1 Time 2 Time 3


Pretest on response measure Treatment Posttest on response measure

Figure 1.17 One-Group Pretest-Posttest Design

In order to illustrate a one-group pretest-posttest design, a pretest will be added to the


arthritis study described for a one-shot case study. Thus, during Time 1 a pretest to measure
arthritis symptoms is administered, the surgical treatment is introduced at Time 2, and a posttest
for symptoms is administered at Time 3. If such a design were employed, a researcher would not
be justified in concluding that the surgical treatment was responsible for the difference if a
decrease in symptoms was found between the pretest and the posttest. Because a control group
was not included in the study, the researcher cannot rule out the possible impact of extraneous
variables. The historical variable employed in the discussion of the one-shot case study (i.e., the
exercise program) could still be responsible for a decrease in symptoms observed in Time 3
relative to Time 1. Note that in the discussion of the one-shot case study it is assumed that
patients have symptoms at the beginning of the study, yet no formal attempt is made to measure
them. On the other hand, in the one-group pretest-posttest design symptoms are formally
quantified by virtue of administering a pretest. Consequently the numerical difference between
the pretest and posttest scores obtained in the latter design might be viewed by some as providing
at least some minimal evidence to suggest that the surgical treatment is effective. In point of fact,
in spite of its limitations, the one-group pretest-posttest design is occasionally employed in
research — most notably in situations where for practical and/or ethical reasons it is impossible
to obtain a comparable control group. In such a case a researcher may feel that, in spite of its
limitations, the one-group pretest-posttest design can shed some light on the hypothesis under
study. In other words, if the alternative is to not conduct a study, the one-group pretest-posttest
design may be viewed as representing the lesser of the two evils. Additional discussion of the
one-group pretest-posttest design can be found in Section VII of the t test for two dependent
samples (Test 17), as well as in the chapters on other inferential statistical procedures which are
employed for comparing two or more dependent samples.

Time 1 Time 2
Experimental group Treatment Response measure
Control group Response measure

Figure 1.18 Nonequivalent Posttest-Only Design (Nonrandom assignm ent o f subjects to groups)

The nonequivalent posttest-only design attempts to rectify the shortcomings of the two
previously described pre-experimental designs by including a control group. In the
nonequivalent posttest-only design one group of subjects (the experimental group) is
administered the experimental treatment at Time 1, whereas a control group is not administered
the treatment at Time 1. At Time 2 both groups are evaluated with respect to the dependent
variable. The internal validity of the design, however, is compromised by the fact that subjects
are not randomly assigned to groups. The nonequivalent posttest-only design is summarized
in Figure 1.18.
A nonequivalent posttest-only design will be used to evaluate the efficacy of the surgical
procedure for arthritis discussed above for the other pre-experimental designs. Two groups of
subjects who have arthritis are employed in the nonequivalent posttest-only design. One group
of subjects (the experimental group/surgery group) are patients of Dr. A, while the other group
Introduction 89

of subjects (the control group/no surgery group) are patients of Dr. B. One year after surgery is
performed on the patients in the experimental group, the symptoms of the two groups are
compared with one another. Let us assume that at Time 2 Dr. A ’s patients exhibit significantly
fewer symptoms of arthritis than Dr. B’s patients. Since the subjects were not randomly assigned
to the groups, it is possible that the surgery was not responsible for the obtained difference at
Time 2. Instead, the difference could be due to some uncontrolled for extraneous variable. For
example, it is possible that the patients Dr. A treats are younger than those treated by Dr. B, and
by virtue of the latter Dr. A ’s patients have less severe forms of arthritis than Dr. B ’s patients.
If the latter is true, it would not be unexpected that Dr. B ’s patients would have more symptoms
at Time 2 than Dr. A ’s patients. The lower number of symptoms for Dr. A ’s patients may not
reflect the success of the surgery but, instead, may be indicative of the fact that they had fewer
symptoms to begin with. If instead of employing a nonequivalent posttest-only design, the study
had been modified such that subjects were randomly assigned to one of the two groups, the
efficacy of the surgery could have been assessed without compromising internal validity. As is
the case with the one-group pretest-posttest design, the nonequivalent posttest-only design
is occasionally employed in research — most notably in situations where for practical and/or
ethical reasons it is impossible to randomly assign subjects to groups. Once again, in such a case
a researcher may feel that, in spite of its limitations, the latter design can shed some light on the
hypothesis under study, and that it represents the lesser of two evils when contrasted with the
alternative of not conducting any study. If a researcher elects to employ an inferential statistical
test to compare scores on the dependent variable of the experimental and control groups in a
nonequivalent posttest-only design, the appropriate test to use will be one of the procedures
described in the book for comparing two or more independent samples (e.g., the t test for two
independent samples).

Quasi-experimental designs Since they do not rule out the possible influence of all
extraneous variables, quasi-experimental designs are also subject to confounding, and thus may
lack internal validity. Quasi-experimental designs, however, rule out more extraneous variables
than pre-experimental designs. Because ofthe latter, quasi-experimental designs are preferable
to employ when practical and/or ethical considerations do not permit a researcher to evaluate a
hypothesis through use of a true experimental design. In most instances, lack of random
assignment of subjects is responsible for compromising the internal validity of a quasi-
experimental design.
The following three quasi-experimental designs will be described: a) Nonequivalent
control group design; b) Separate-sample pretest-posttest design; c) Time series designs
(interrupted time series design and multiple time series design).

The nonequivalent control group design is a modification of the nonequivalent posttest-


only design to include a pretest measure on the dependent variable. In the nonequivalent control
group design one group o f subjects (the experimental group) is given a pretest on the dependent
variable at Time 1, administered the experimental treatment at Time 2, and is reevaluated with
respect to the dependent variable at Time 3. As is the case with the experimental group, the
control group is administered a pretest and posttest on the dependent variable at Times 1 and 3,
but it is not administered any treatment at Time 2. The internal validity of the nonequivalent
control group design, however, is still compromised by the fact that subjects are not randomly
assigned to groups. The nonequivalent control group design is summarized in Figure 1.19.
90 Handbook o f Parametric and Nonparametric Statistical Procedures

Time 1 Time 2 Time 3


Experimental group Pretreatment Treatment Posttreatment
response measure response measure
Control group Pretreatment Posttreatment
response measure response measure

Figure 1.19 Nonequivalent Control Group Design (Nonrandom assignm ent o f subjects to groups)

A nonequivalent control group design will be described within the context of evaluating
the efficacy of the surgical procedure for arthritis discussed previously. The design will be
identical to that described for the nonequivalent posttest-only design, except for the fact that
at Time 1 a pretest measure of symptoms is obtained for both the experimental and control groups.
At Time 2 the experimental group has the surgical treatment, while no treatment is administered
to the control group. At Time 3 a posttest measure of symptoms is obtained from both groups.
As is the case in the discussion of the nonequivalent posttest-only design, we will assume that
Dr. A’s patients comprise the experimental group, and Dr. B’s patients comprise the control
group. Since, however, once again subjects were not randomly assigned to the groups, if at Time
3 fewer symptoms of arthritis are observed in the experimental group, it could be due to an
extraneous variable for which the experimenter did not control. However, if the extraneous
variable is that Dr. A ’s patients are younger than Dr. B’s patients, and by virtue of the latter they
have fewer symptoms at the beginning of the study, this difference will be identified with the
pretest administered during Time 1 of the nonequivalent control-group design. The latter
illustrates that the nonequivalent control group design is more likely to identify potentially
confounding variables than is the nonequivalent posttest-only design. Unfortunately, due to the
lack ofrandom assignment the nonequivalent control-group design cannot rule out the potential
influence of all extraneous variables. Christensen (2000) and Cook and Campbell (1979) provide
excellent discussions of the nonequivalent control group design in which they describe how the
possible role of extraneous variables can be assessed, through an analysis of the configuration of
the pretest and posttest scores of the two groups. The nonequivalent control group design,
which is frequently employed in research, is more desirable to use than any of the previously
discussed pre-experimental designs when practical and/or ethical issues make it impossible to
employ a true experimental design (which always employs random assignment). If a researcher
elects to employ an inferential statistical test to evaluate a nonequivalent control group design,
the analytical procedures recommended for the analysis of the pretest-posttest control group
design (discussed later in this section under true experimental designs) would be used. The
latter procedures are discussed in Section VII of the t test for two dependent samples and
Section IX (the Addendum) of the between-subjects factorial analysis of variance (Test 27).
The separate-sample pretest-posttest design is employed on occasion when circumstances
do not allow a researcher to have access to both an experimental and control group throughout
the duration of a study. In this design, at Time 1 a pretest measure on a dependent variable is
obtained on a random sample of subjects who are derived from the population of interest. The
latter group of subjects will represent the control group. At Time 2 an experimental treatment is
administered to the whole population. At Time 3 a posttest measure on the dependent variable is
obtained from a new group of randomly selected subjects from the population. This latter group
will represent the experimental group. The separate-sample pretest-posttest design is
summarized in Figure 1.20.
Introduction 91

Time 1 Time 2 Time 3


Control group Pretreatment Treatment
response measure

Experimental group Treatment Posttreatment


response measure

Figure 1.20 Separate-Sample Pretest-Posttest Design

The separate-sample pretest-posttest design is most commonly employed in survey


research where it may not be possible to use the same sample before and after some treatment is
administered to a population. To illustrate this design, assume that at Time 1 a market researcher
solicits the attitude of a random sample of subjects (the control group) derived from a population
of consumers about Product X. At Time 2 the experimental treatment (which is an advertising
campaign that attempts to convince the population to buy Product X) is administered. At Time
3 the market researcher solicits the attitude toward Product X from a new random sample of
subjects (the experimental group) derived from the same population. The difference in consumer
attitude between the control group’s pretest score and the experimental group’s posttest score is
employed to assess the effectiveness of the advertising campaign. Campbell and Stanley (1963)
note that lack of control for extraneous historical variables is most likely to compromise the
internal validity of the separate-sample pretest-posttest design. Thus in the example under
discussion, negative publicity associated with a competing product (rather than the advertising
campaign for Product X) could be the reason subjects express a more positive attitude about
Product X at Time 3 than at Time 1. History, however, is not the only factor which can
compromise the internal validity of the separate-sample pretest-posttest design. If a researcher
elects to employ an inferential statistical test to contrast the control group pretest score with the
experimental group posttest score, the appropriate test to use would be one of the procedures
described in the book for comparing two or more independent samples (e.g., the t test for two
independent samples). (If matched subjects (discussed later in this section) are employed, an
inferential procedure comparing two or more dependent samples would be used.)
In time series designs multiple measurements are obtained for one or more groups on a
dependent variable before and after an experimental treatment. An interrupted time series
design involves a single group of subjects, whereas in a multiple time series design two or more
groups are employed. As a general rule, if more than one group of subjects is employed in a time
series design, each of the groups represents a distinct population which can be distinguished from
the populations that comprise the other groups on the basis of some defining characteristic (e.g.,
each group can be comprised of residents who live in different geographical locales during
specific time periods). Since an interrupted time series design only involves a single group, the
absence of a control group makes it more difficult for a researcher to rule out the potential effects
of one or more extraneous variables (e.g., history, maturation, etc.) on the dependent variable.
The inclusion of one or more additional groups in a multiple time series design puts the
researcher in a better position to rule out the possible impact of extraneous variables. The
interrupted time series design and a multiple time series design involving two groups are
summarized in Figure 1.21. In the latter figure, Time 1 represents the first of (n - 1) pretest
measures on the dependent variable; Time n represents the time at which the treatment is
administered, and Times (n + 1) through m represent (m - n) posttest measures on the dependent
variable, with Time m representing the final posttest measure. Note that when n = 2 and m = 3,
the interrupted time series design becomes the one-group pretest-posttest design. When there
are two groups, and n = 2 and m = 3, the multiple time series design becomes the nonequivalent
92 Handbook o f Parametric and Nonparametric Statistical Procedures

control-group design (since, typically, multiple time series designs do not employ random
assignment).
In order to illustrate both the interrupted and multiple time series designs let us assume
that a researcher wants to determine whether or not a reduction of the speed limit from 65 mph
to 55 mph on interstate highways reduces the number of fatal accidents. Let us also assume that
the researcher discovers that such a change in the speed limit was implemented in the state of
Connecticut effective January 1, 1998. Employing an interrupted time series design, the
researcher determines the number of fatal accidents on Connecticut interstate highways during the
following time periods: a) Each of the five years which precede the new law going into effect (i.e.,
1993,1994,1995,1996,1997); b) Each of the first five years the law is in effect (i.e., 1998,1999,
2000, 2001, 2002). This design is summarized in Figure 1.22. Note that the summary indicates
there are five pretreatment measures of the dependent variable (i.e., the number of fatal accidents
on Connecticut interstate highways in 1993, 1994, 1995, 1996, and 1997), the treatment (which
is the speed limit reduction law becoming effective January 1, 1998), and the five posttreatment
measures of the dependent variable (i.e., the number of fatal accidents on Connecticut interstate
highways in 1998, 1999, 2000, 2001, and 2002).

Interrupted Time Series Design

Times 1 to («-1) Time n Time (n + 1) to m


Pretreatment Treatment Posttreatment
response measures response measures

Multiple Time Series Design

Times 1 to (« -l) Time n Time (#* + 1) to m


Experimental group Pretreatment Treatment Posttreatment
response measures response measures

Control group Pretreatment --------- Posttreatment


response measures response measures

Figure 1.21 Time Series Designs

Times 1-5 Time 6 Times 7-10


1993 1994 1995 1996 1997 Treatment 1998 1999 2000 2001 2002

Figure 1.22 Interrupted Time Series Design

If, in fact, the law is effective in reducing the number of fatal accidents, it would be expected
that a graphical and numerical analysis ofthe data will indicate fewer accidents involving fatalities
during the time period 1998-2002 versus the period 1993-1997. The main limitation of the above
described interrupted time series design is that is does not control for the possibility that some
other extraneous variable (such as history) might be confounded with the time periods involved
before and after the law going into effect. For example, if the weather was more inclement (e.g.,
rain, snow, fog, etc.) during the period 1993-1997 than the period 1998-2002, the latter might
account for a greater number of accidents involving fatalities between 1993 and 1997. The
researcher would also have to rule out other factors, such as that superior safety features in
Introduction 93

automobiles (e.g., superior body integrity, better passenger restraint systems, etc.) were integrated
into automotive designs in 1998, and could thus account for the reduced accident rate involving
fatalities from 1998 on.
If a multiple time series design is employed to evaluate the same hypothesis, one or more
control groups are included in the study. For example, let us assume the state of Rhode Island is
employed as a control group for the following reasons: a) The Rhode Island interstate highway
speed limit is 65 mph throughout the time period 1993-2002 — i.e., no law reducing the 65 mph
speed limit is passed in Rhode Island within the time period covered by the study; b) Rhode Island
is viewed as similar to Connecticut due to its close geographical proximity and its demographic
compatibility. The multiple time series design utilizing the two states is summarized in Figure
1.23.
Let us assume that employing the above design, a decrease in fatal accidents is observed in
Connecticut during the period 1998-2002 relative to the period 1993-1997. In Rhode Island,
however, the accident rate is the same during both time periods. Under such circumstances, a
researcher would be able to argue more persuasively that the decline in fatal accidents in
Connecticut was a direct result of the speed limit reduction law going into effect January 1,1998.
Although still imperfect (since subjects are not randomly assigned to the two groups), the
multiple time series design allows the researcher to rule out some of the alternative extraneous
variables noted earlier in reference to the interrupted time series design. Specifically, since it
is reasonable to presume that the two adjoining states experienced similar weather conditions
during the relevant time periods, the weather factor is unlikely to account for the differential fatal
accident rates. The automobile safety issue can also be ruled out, since the same types of
automobiles would be assumed to be operative in both states. The researcher, however, is still not
able to entirely rule out the possible influence of extraneous variables. It should be noted that the
external validity of the above described study can be challenged, insofar as one can question
whether or not its results can be generalized to other states which are demographically and/or
geographically distinct from Connecticut and Rhode Island. One way to address the latter
criticism is to include in the study one or more additional groups representing other states. Such
groups can represent states in which the speed limit is reduced from 65 mph to 55 mph (i.e.,
additional experimental groups representing geographically and/or demographically distinct
states) and/or states in which the 65 mph speed limit is not modified (which can represent
additional control groups). In spite of their limitations, time series designs are frequently
employed in the social sciences (e.g., economics, political science) as well as in business to
evaluate “real world” problems that are not amenable to being evaluated through use of the
experimental method (i.e., which cannot be evaluated with true experimental designs). Further
discussion of time series designs can be found in the description of the inferential statistical tests
for evaluating two or more dependent samples (e.g., Cochran (Mest (Test 26)) and in Section
IX (the Addendum) of the chapter on the Pearson product-moment correlation coefficient.
An excellent discussion of time series designs can be found in Shadish et al. (2002).

Times 1-5 Time 6 Times 7-10


Connecticut 1993 1994 1995 1996 1997 Treatment 1998 1999 2000 2001 2002
Rhode Island 1993 1994 1995 1996 1997 --------- 1998 1999 2000 2001 2002

Figure 1.23 Multiple Time Series Design

Although not categorized under the rubric of time series designs, two other experimental
designs which can be employed to evaluate a phenomenon over time are longitudinal and cross-
sectional designs — both of which are most commonly employed to evaluate whether or not
94 Handbook o f Parametric and Nonparametric Statistical Procedures

changes occur with respect to some developmental process over time. In a longitudinal design
(also referred to as a panel design) a single group of subjects is evaluated over repeated time
intervals in order to determine whether or not changes occur with respect to some characteristic
of the subjects. A cross-sectional design evaluates representative groups of subjects — with
each of the groups (which are often referred to as cohorts) representing individuals at different
age levels — with respect to some characteristic.
The two above noted designs will be illustrated with respect to evaluating the evolution of
intelligence over a person’s lifetime. If a researcher wanted to employ a longitudinal design to
evaluate whether or not intelligence improves and/or deteriorates as one ages, she could employ
a sample of subjects and evaluate them with respect to intelligence over a prolonged period of
time. Specifically, each of the subjects could be evaluated with a standardized test of intelligence
at specific time periods (e.g., at the ages of 5, 10, 15, 20, 40, 60, and 80). Two factors which
might deter one from conducting the latter type of study are: a) The substantial cost involved in
conducting a study over a 75 year period; and b) A longitudinal study can easily extend beyond
a researcher’s professional career and/or life span, and consequently the latter individual might
never get to publish (or, for that matter, get credit for) the study during one’s lifetime. The
variable most often cited as potentially compromising the internal validity of a longitudinal
study is subject mortality. Specifically, the number of subjects lost will increase with the
passage of time, and a nonrandom pattern of subject mortality can compromise the internal
validity of a study.
If one elected to employ a cross-sectional design to assess developmental changes in
intelligence across one’s life span, multiple samples of subjects representing different ages could
simultaneously be evaluated with respect to intelligence. Thus, a researcher could contrast the
intelligence of seven groups of subjects, with each group/cohort being comprised of individuals
who are the following ages: 5, 10, 15, 20, 40, 60, and 80. The greatest limitation of a cross-
sectional design is that the equivalence of the groups cannot be insured. To be more specific, the
variable most often cited as potentially compromising the internal validity of a cross-sectional
study is history. More specifically, the greater the age discrepancy between any two groups/age
cohorts, the greater will be the difference with respect to the social and physical environmental
experiences to which they have been exposed during their lifetime. Consequently, any differences
in intelligence detected between the different age groups could be confounded by historical
factors.
It is interesting to note that the results of experiments employing longitudinal versus
cross-sectional designs investigating the same phenomenon may be inconsistent with one
another. To illustrate, Christensen (2000, pp. 58 - 59) cites Baltes et al. (1977) discussion of
studies regarding the evolution of intelligence over a person’s lifetime. The latter authors note that
cross-sectional research suggests that intelligence increases up until the age of 30 and then
declines as one gets older. Longitudinal studies, on the other hand, suggest an increase in
intelligence until about the age of thirty, after which there is either no change or an additional
slight increase. The inconsistency between the two types of studies is explained on the basis of
what is referred to as the age-cohort effect. The latter reflects the fact that by virtue of being the
same age as all of the other subjects involved in the study, those who participate in a longitudinal
study are more likely to share with their fellow subjects similar environmental experiences over
the duration of their lifetime than subjects who participate in a cross-sectional study. Subjects
who are involved in a cross-sectional study (who may represent different generations) are much
less likely to share common environmental experiences with subjects in other age groups because
of different environmental events which are associated with specific time periods.
Within the framework of true experimental designs (which are described in the next
section), a longitudinal design can be viewed as a dependent samples design. In the final
Introduction 95

analysis, however, due to the likelihood of nonrandom subject mortality, plus the fact that the
independent variable of age is not directly manipulated by the experimenter, a longitudinal
design does not conform to the requirements of a true experimental design. Within the context
of true experimental designs, a cross-sectional design can be viewed as an independent
samples design. Yet, in reality, a cross-sectional design does not conform to the requirements
of a true experimental design, since it does not insure equivalency of the groups.
A compromise between a longitudinal and cross-sectional design is a cohort-sequential
design. In the latter design two or more groups/cohorts o f subjects whose ages overlap with one
another are longitudinally evaluated with respect to a characteristic of interest over a specific
period of time. As an example, assume that a researcher wishes to evaluate whether or not
changes in intelligence occur between the ages of 5 and 20. A study is initiated during a given
year (e.g., 2005) with a sample o f five year old subjects, and each o f the subjects is administered
a standardized intelligence test at the ages of 5 (the year 2005 — the year in which the study
commences), 10 (the year 2010), 15 (the year 2015), and 20 (the year 2020). A second sample
of five year old subjects is introduced into the study ten years later (e.g., 2015), and each o f these
subjects is administered the standardized intelligence test at the ages of 5 (the year 2015 — the
first year in the study for these subjects), 10 (the year 2020), 15 (the year 2025), and 20 (the year
2030). Note that the study allows the researcher to evaluate two cohorts of subjects
longitudinally, yet each of the cohorts can be viewed as representing a different generation since
they are bom at different points in time. If similar patterns of intellectual change are detected in
both samples, it would suggest that changes in intellect are more likely to be related to one’s age
than to one’s generation. On the other hand, if there is a disparity between the patterns of
intellectual change in the two samples, it would suggest that generational/historical factors rather
than age may be more likely to impact intelligence. The cohort-sequential design can be
evaluated within the context of a factorial design, which is discussed in the chapter on the
between-subjects factorial analysis of variance (Test 27). More specifically, the cohort-
sequential design can be conceptualized as a mixed factorial design involving two independent
variables — with the four age levels representing one independent variable (involving repeated
measures on each subject) and the two cohorts representing a second independent variable
(involving two independent groups). Analysis of a mixed factorial design is described in Section
IX (the Addendum) of the between-subjects factorial analysis of variance.

True experimental designs Designs which are categorized as true experimental designs
are characterized by random assignment of subjects to groups and the inclusion of one or more
adequate control groups. The latter conditions optimize control of extraneous variables, and
thereby maximize the likelihood of a study having internal validity. The following four true
experimental designs will be described in this section: a) Independent samples design; b)
Dependent samples design; c) Pretest-posttest control group design; d) Factorial designs.

An independent samples design is also known as an independent-groups design,


between-subjects design, between-groups design, between-subjects after-only research
design, and randomized-groups design. In an independent samples design each of n different
subjects is randomly assigned to one of k experimental groups. In the most elementary
independent samples design there are k = 2 groups, with one group representing the experimen­
tal group and the other the control group. Inspection of the latter design, which is summarized
in Figure 1.24, reveals that it is identical to the nonequivalent posttest-only design (described
earlier in this section under pre-experimental designs), except for the fact that in the
independent samples design subjects are randomly assigned to the two groups. The latter
96 Handbook o f Parametric and Nonparametric Statistical Procedures

modification of the nonequivalent posttest-only design allows a researcher to control for the
potential effects of extraneous variables.

Time 1 Time 2
Experimental group Treatment Response measure
Control group -- Response measure

Figure 1.24 Independent Samples Design (Random assignment o f subjects to different groups)

The independent samples design can be illustrated through use of the arthritisstudy
employed to illustrate thenonequivalent posttest-onlydesign, with the modification that
subjects are randomly assigned to the two groups. Extensive discussion (including examples and
analytical procedures) of the independent samples design can be found in the chapters on the
t test for two independent samples, the single-factor between-subjects analysis of variance,
and other tests involving two or more independent samples.
A dependent samples design is also known as a within-subjects design, within-subjects
after-only research design, repeated measures design, treatment-by-subjects design,
correlated samples design, matched-subjects design, paired-sample design, and randomized-
blocks design. In a dependent samples design each of n subjects serves in each of k
experimental conditions. A dependent samples design can also involve the use of matched
subjects. Within the latter context it is commonly referred to as a matched-subjects design. In
such a design each subject is paired with one or more other subjects who are similar with respect
to one or more characteristics that are highly correlated with the dependent variable. The general
subject of matching is discussed in detail in Section VII of the t test for two dependent samples.
A dependent samples designs (as well as a matched-subjects design) is sometimes categorized
as a randomized-blocks design, since the latter term refers to a design which employs
homogeneous blocks of subjects (which matched subjects represent). When a dependent samples
design is conceptualized as a randomized-blocks design, it is because within each block the
same subject is matched with himselfby virtue of serving under all ofthe experimental conditions.
A dependent samples design involving two experimental conditions is summarized in Figure
1.25.
Time 1 Time 2
Experimental condition Treatment Response measure
Control condition Response measure

Figure 1.25 Dependent Samples Design (A ll subjects serve in both conditions)

As an example illustrating the dependent samples design, assume a researcher wants to


evaluate the efficacy of a drug on the symptoms of arthritis. Fifty subjects are evaluated for a six-
month period while taking the drug (the experimental condition), and for a six-month period
while not taking the drug (the control condition, during which time subjects are administered a
placebo). Half o f the subjects are initially evaluated in the experimental condition after which
they are evaluated in the control condition, while the other half of the subjects are initially
evaluated in the control condition after which they are evaluated in the experimental condition.
This latter procedure, which is known as counterbalancing, is discussed in Section VII of the
t test for two dependent samples. The data are evaluated by comparing the mean scores of
subjects with respect to symptoms in the drug/experimental versus placebo/control conditions.
Extensive discussion (including a description of analytical procedures and additional examples)
Introduction 97

of the dependent samples design can be found in the chapters on the t test for two dependent
samples, the single-factor within-subjects analysis of variance (Test 24), and other tests
involving two or more dependent samples.
The pretest-posttest control group design (also referred to in some sources as the before-
after design) is identical to the nonequivalent control group design (described earlier in this
section under quasi-experimental designs), except for the fact that subjects are randomly
assigned to the two groups. The latter modification of the nonequivalent control group design
allows a researcher to control for the potential effects of extraneous variables. Note that the
pretest-posttest control group design is also an improvement over the one-group pretest-
posttest design (discussed earlier in this section under pre-experimental designs), insofar as
the latter design lacks a control group. The pretest-posttest control group design is summarized
in Figure 1.26.
The pretest-posttest control group design can be illustrated through use of the arthritis
study employed to illustrate the nonequivalent control group design, with the modification that
subjects are randomly assigned to the two groups. The analysis of the pretest-posttest control
group design is discussed in Section VII of the t test for two dependent samples.

Time 1 Time 2 Time 3


Experimental group Pretreatment Treatment Posttreatment
response measure response measure
Control group Pretreatment Posttreatment
response measure response measure

Figure 1.26 Pretest-Posttest Control Group Design


(Random assignment o f subjects to different groups)

Factorial designs All of the designs described up to this point in this section attempt to
assess the effect of a single independent variable on a dependent variable. A factorial design is
employed to simultaneously evaluate the effect of two or more independent variables on a
dependent variable. Each of the independent variables is referred to as a factor. Each of the
factors has two or more levels, which refer to the number of groups/experimental conditions
which comprise that independent variable. If a factorial design is not employed to assess the
effect of multiple independent variables on a dependent variable, separate experiments must be
conducted to evaluate the effect of each of the independent variables. One major advantage of
a factorial design is that it allows the same set of hypotheses to be evaluated at a comparable
level of power by using only a fraction of the subjects that would be required if separate
experiments were conducted to evaluate the relevant hypotheses for each of the independent
variables. Another advantage of a factorial design is that it permits a researcher to evaluate
whether or not there is an interaction between two or more independent variables — the latter
being something which cannot be determined if only one independent variable is employed in a
study. An interaction is present in a set of data when the performance of subjects on one
independent variable is not consistent across all the levels of another independent variable.
Extensive discussion (including a description of analytical procedures and examples) of factorial
designs can be found in the chapter on the between-subjects factorial analysis of variance.
Among the factorial designs discussed in the latter chapter are the between-subjects factorial
design, the mixed factorial design, the within-subjects factorial design, and the Latin square
design.
98 Handbook o f Parametric and Nonparametric Statistical Procedures

Single-subject designs An experimental design which attempts to determine the effect of


an independent variable/treatment on a single organism is referred to as a single-subject design.
Since single-subject designs evaluate performance over two or more distinct time periods, such
designs can be conceptualized within the framework of a time series design. There is, however,
a lack o f agreement among researchers with respect to whether single-subject designs are best
classified as quasi versus true experimental designs. In the final analysis, the appropriate
classification for such a design will be predicated on its ability to rule out the potential effects
of extraneous variables on the dependent variable. The ability of a single-subject design to
achieve the latter may vary considerably depending upon the composition of a specific design.
Single-subject designs are most frequently employed in clinical settings in order to assess
the efficacy of a specific treatment on a subject. These designs are often employed in the field
of clinical psychology within the framework of behavior modification research. The latter type
of research (which derives from the work of the American behavioral psychologist B. F. Skinner)
assesses the effect of one or more environmental manipulations (such as reward and punishment)
on inappropriate behavior.
Hersen and Barlow (1976) note that researchers are often reluctant to employ single-subject
designs because of the following limitations associated with them: a) A researcher will be limited
by the fact that the results of a study based on a single-subject design may not be able to be
generalized to other subjects; b) Single-subject designs may be susceptible to confounding
and/or order effects. An order effect is where an observed change on a dependent variable is
a direct result of the order of presentation of the experimental treatments, rather than being due
to the independent variable manipulated by the experimenter. Order effects are discussed in
greater detail in the chapters on inferential statistical procedures for two or more dependent
samples; c) Single-subject designs are problematic to interpret when a researcher wishes to
simultaneously assess the effect of two or more independent variables on a dependent variable.
In spite of their limitations, Dukes (1965) notes that it may be prudent to employ a single­
subject design under the following circumstances: a) When the issue of generalizing beyond the
subject employed in a study is not of major concern; b) When the dependent variable being
evaluated has a low frequency of occurrence in the underlying population (and thus it is
impractical or impossible to evaluate the treatment with a large group of subjects); c) When the
dependent variable being evaluated is characterized by low intersubject variability. Hersen and
Barlow (1976) also note that research in clinical settings is often characterized by the fact that
numerous extraneous variables are present which can interact with the treatment variable. Such
interactions may be responsible for results which suggest that a treatment is ineffective with a
group of subjects. Yet within the group, individual subjects may respond to the treatment, and
consequently a single-subject design employed with individual subjects may yield positive
results.
In order to illustrate a single-subject design, consider the following example. A six year
old child has temper tantrums which disrupt his first-grade class. Every time the child has a
temper tantrum the teacher removes him from the classroom. The school psychologist
hypothesizes that the child has temper tantrums in order to get the reward of attention, which
removal from the classroom represents. In order to evaluate the hypothesis, the psychologist
conducts the following study employing an ABAB single-subject design. The letters A and B
represent four distinct time periods which comprise the study, with the letter A indicating that
no treatment is in effect and the letter B indicating the treatment is in effect. Since in an ABAB
design Timel is designated A, no treatment is in effect. Specifically, every time the child has a
temper tantrum he is removed from the classroom. The measure of the subject’s behavior during
Time 1 is intended to provide the researcher with an initial indicator (referred to as a baseline
measure) of how often the child exhibits the behavior of interest (commonly referred to as the
Introduction 99

target behavior). This baseline value for temper tantrums will be employed later when it will
be compared to the number of temper tantrums emitted by the subject during the times when the
treatment is in effect. Since Time 2 is designated B, the treatment is administered. Specifically,
during Time 2 every time the child has a temper tantrum he is ignored by the teacher, and thus
remains in the classroom. If the treatment is effective, a decrease in temper tantrums should
occur during Time 2. After the period of time allotted for Time 2 elapses, Time 3 is initiated.
During this time the A condition is reintroduced. Specifically, once again the child is removed
from the classroom any time he has a temper tantrum. If, in fact, the treatment is effective, it is
expected that the frequency of temper tantrums will increase during Time 3 and most likely return
to the baseline level recorded during Time 1. It should be noted, however, that one problem
associated with the ABAB design is that if during Time 2 the treatment is “too” effective, during
Time 3, the subject may not regress back to the baseline level of behavior. Assuming the latter
does not occur, once the time allotted for Time 3 has elapsed, Time 4 is initiated. During this
final time period the treatment (i.e., the B condition) is reintroduced. If the results of the study
indicate a high rate of temper tantrums during Times 1 and 3 (i.e., when no treatment was in
effect) and a low rate during Times 2 and 4 (i.e., when the treatment was in effect), the researcher
will have a strong case for arguing that the treatment was responsible for reducing the number
of temper tantrums. The use of the fourth time period further reduces the likelihood that instead
of the treatment, some uncontrolled for extraneous variable(s) (e.g., maturation or history) might
have been responsible for the decline in temper tantrums. It should be noted that depending upon
the circumstances surrounding a study (i.e., the practical and ethical considerations associated
with applying a treatment to a single subject), an ABAB design may be truncated to consist of
only two or three time periods (in which case it respectively becomes an AB or ABA design).
The reader should keep in mind that although the ABAB design has been employed to illustrate
a single-subject design, it represents only one of a number of such designs that can be employed
in research. Since a comprehensive discussion of single-subject designs is beyond the scope of
this book, the interested reader can find more detailed discussions of the topic in sources such
as Christensen (2000), Hersen and Barlow (1976), and Sheskin (1984).

Sampling Methodologies

A great deal of research is based on the use of nonrandom samples (also referred to as
nonprobability samples). In nonrandom or nonprobability sampling the probability of an
object/subject being selected cannot be computed. For example, a sample of subjects who are
volunteers or subjects who have been selected purely on the basis of their availability do not
constitute a random sample. Such samples are commonly referred to as convenience samples.
Another example of a convenience sample would be those members of a population who return
a questionnaire a researcher has mailed to them.
Survey research commonly employs procedures other than simple random sampling, which
is the purest form of what is often referred to as the probability method o f sampling. Some
other sampling procedures which are categorized under the probability method of sampling will
now be described. Some sources (e.g., Bechtold and Johnson (1989)) refer to the sampling
procedures to be described as examples of restricted random sampling.
One alternative to simple random sampling is systematic sampling. In the latter type of
sampling a list of the members who comprise a population is available, and every nthperson is
selected to participate in a survey. In systematic sampling it is critical that there is no preexisting
pattern built into the list, which might bias the composition of the sample derived through this
methodology (e.g., an alphabetized list may increase or decrease the likelihood of specific
individuals being selected). A systematic sample is not a truly random sample by virtue of the
100 Handbook o f Parametric and Nonparametric Statistical Procedures

fact that by employing the rule that every nth person be selected a limitation is imposed on the
number of different samples that can be selected. Randomizing the names in an alphabetized list
prior to the selection process, and then selecting every nlh person optimizes the likelihood that
the final sample is unbiased, and for all practical purposes is commensurate with simple random
sampling.
Another alternative to simple random sampling is cluster sampling. It is often the case that
the members of a population can be broken down into preexisting groups or clusters (such as
towns, blocks, classrooms of children, etc.). A researcher selects a random sample of the clusters,
and data are then obtained from all the members of the selected clusters. The reliability of cluster
sampling is predicated on whether or not the people who constitute the selected clusters are
representative of the overall population (and, of course, a function of the response rate within
each cluster). An example of cluster sampling would be a market researcher dividing a city into
blocks (all of which are assumed to be comparable), randomly selecting a limited number of the
blocks, and obtaining opinions from all the people who live in the selected blocks.
Stratified sampling is a methodology employed in survey research which allows a
researcher to focus on specific subpopulations embedded within the larger population. In
stratified random sampling a population is divided into homogeneous subgroups referred to as
strata. Random subgroups are then selected from each of the strata. With respect to the latter,
one of the following two procedures is employed: a) The number of subjects selected from each
stratum corresponds to the proportion of that stratum in the overall population; b) An equal
number of subjects are selected from each stratum, and their responses are weighted with respect
to the proportion of that stratum in the overall population.
When properly implemented, stratified sampling can provide a researcher with accurate
information on those members of the population who comprise the various strata (although, in
theory, simple random sampling should accomplish the same goal). As an example of stratified
sampling, assume a population is comprised of the following four distinct ethnic groups:
Caucasian, Black, Asian, Hispanic. Fifty percent ofthe population is Caucasian, 20% Black, 20%
Asian, and 10% Hispanic. For a sample comprised of n subjects, 50% of the sample is selected
from the Caucasian subpopulation, 20% from the Black subpopulation, 20% from the Asian
subpopulation, and 10% from the Hispanic subpopulation.
Note that the goal of stratified sampling is to identify strata that are similar/homogeneous
within themselves. Thus, there is a minimum of within-strata variability, but there is a high degree
of between -strata variability (i.e., the different strata are dissimilar from one another). On the
other hand, in cluster sampling the goal is to produce similar clusters, with each cluster
containing the full spectrum of the population. Thus, in cluster sampling there is high within-
clusters variability but minimal between-clusters variability.
It should be noted that the actual methodology employed in survey sampling may involve
additional procedures that go beyond that which has been described above. Among others,
Scheaffer et al. (1996) provide detailed documentation of different types of sampling procedures.
In addition, the latter authors describe procedures for estimating population parameters (e.g.,
means, standard deviations, proportions, etc.) when methods other than simple random sampling
are employed (and in some instances the only data available may be in the format of summary
information for elements such as clusters or strata (as opposed to the actual responses of each
subject)).
Regardless of what sampling procedure is employed, the goal of survey research is to
minimize sampling error (i.e. minimize the difference between a sample statistic and the
population parameter it estimates). When conducting surveys which involve a large population
the discrepancy between an estimated population proportion obtained from a sample of size n and
the actual proportion in the underlying population rarely exceeds \ /\fn . To illustrate, if a sample
Introduction 101

of n = 1000 subjects is obtained from a large population (e.g., ten million people) and it is
determined that 55% of the sample endorses a particular candidate, we can be almost certain
(approximately, 95% confident) that the range of values 55% ±3.16% (since 1//T000 = .0316,
which expressed as a percent is 3.16%) contains the true proportion of the population who
support the candidate — the latter result is commonly expressed as 55% with a margin of error
of ± 3.16% or ± 3.16 percentage points. (Equation 8.6, described in Section VI of the chi-square
goodness-of-fit test, is employed to compute the exact value of the above noted range of values,
which is referred to as the confidence interval for a population proportion.) In point o f fact,
most reputable surveys employ sample sizes of n >1000, and in actuality sampling error only
decreases minimally as sample size is increased beyond 1000. Further discussion of analysis of
survey data can be found in Section VI of the chi-square goodness-of-fit test under the section
on confidence intervals and in Endnote 10 of the latter test. Among others, Daniel and Terrell
(1995), Folz (1996), and O ’Sullivan et al. (2002) provide more detailed discussions of survey
sampling.

Basic Principles of Probability

This section will summarize and illustrate a number of basic principles which are commonly
employed in computing probabilities.27 The logic underlying some of the statistical procedures
described in the book is based on one or more of the principles described in this section.

Elementary rules for computing probabilities The first rule to be presented for computing
probabilities is commonly referred to as the addition rule. The latter rule states that if two events
A and B are mutually exclusive (i.e., if the occurrence of one event precludes the occurrence
of the other event), the probability that either event A or event B will occur is the sum of the
probabilities for each event. The addition rule is summarized with Equation 1.40.

P{A or B) = P(A) + P(B) (Equation 1.40)

To illustrate the application of the addition rule with two mutually exclusive events,
consider the following example. If a card is randomly selected from a deck of playing cards,
what is the probability of obtaining either a Red card (for which p = !4) or a Black card (for
which p = 14)? Employing Equation 1.40, the probability value p = 1 is computed below.

P(Red or Black) = P(Red) + P(Black) ­ 14 + V2 = 1

The union of two events A and B occurs if either A occurs, B occurs, or A and B occur
simultaneously. The simultaneous occurrence of A and B is commonly referred to as the
intersection of A and B. If the two events A and B are not mutually exclusive (i.e., they
intersect/can occur simultaneously), the probability for the union of events A and B is computed
with Equation 1.41. Note that the symbol U is employed to represent the union of events.

P(A or B) = P(A U B) = P(A) + P(B) ­ P(A and B) (Equation 1.41)

Since the symbol fl represents the intersection of events, Equation 1.41 can also be written
as follows.

P(A or B) = P(A U B) = P{A) + P{B) ­ P(A fl B).


102 Handbook o f Parametric and Nonparametric Statistical Procedures

To illustrate the application of the addition rule with two events which are not mutually
exclusive, consider the following example. If a card is randomly selected from a deck of playing
cards, what is the probability o f obtaining either a Red card or a King? Employing Equation 1.41
the probability 28/52 is computed below.

P(Red or King) = />(Red U King) = R(Red) + R(King) ­ P(Red fl King)

R(Red or King) = />(Red U King) = 26/52 + 4/52 - 2/52 = 28/52

The above computations indicate that 26 of the 52 cards in a deck are Red (thus, P (Red)
= 26/52), and that 4 of the cards are a King (thus, P (King) = 4/52). Two of the Kings (the King
of Diamonds and King of Hearts), however, are both a Red card and a King. When the
probability of obtaining a card that is both Red and a King (2/52) (in other words, the
intersection of the latter two events, which is represented with the notation Red fl King) is
subtracted from the sum of the values P (Red) and P (King), what remains is the probability of
selecting a Red card or a King. Thus, the value 28 in the numerator of the computed probability
28/52 indicates that the 26 red cards (which include the King of Diamonds and King of Hearts)
plus the 2 black Kings (the King of Spades and King of Clubs) constitute the outcomes which
meet the requirement of being a Red card or a King.28

(a) Two mutually exclusive events A and B (c) A and A, where A is complement o f A

AnB

(b) Union o f two nonmutually exclusive events A and B (d) Intersection o f events A and B
AuB AnB

Figure 1.27 Venn Diagrams


Introduction 103

Figure 1.27 provides a visual summary of the relationships between two events A and B
described in this section. The visual relationships depicted in Figure 1.27 are referred to as Venn
diagrams.29 A typical Venn diagram employs a circle to represent all possible outcomes of a
specific event. Thus, in each part of Figure 1.27, all outcomes identified as A are contained within
the circle labeled A, and all outcomes identified as B are contained within the circle labeled B.
Figure 1.27(a) describes the two mutually exclusive events A and B, and as noted above, since
A and B do not intersect, P(A fl B) - 0. Figure 1.27(b) describes two nonmutually exclusive
events A and B, where the two circles represent the union of A and B. Figure 1.27(c) represents
A and its complement A. The complement of A is comprised of all possible outcomes other
than those contained in A. Figure 1.27(d) describes the intersection of the events A and B, which
is represented by the area in the center designated by the notation AV\B.
Another commonly employed rule for computing probabilities is the multiplication rule.
In order to state the latter rule, the concept of conditional probability must be employed. The
conditional probability o f an event is the probability the event will occur, given the fact that
another event has occurred. To be more specific, the conditional probability of an event A, given
the fact that another event B has already occurred (which is represented with the notation
P(A/B)), can be determined through use of Equation 1.42. Note that both the notations P(AC\B)
and P{AB) can be used to represent the intersection of events A and B.

P(A/B) = = E kiE l (Equation 1.42)


P(B) P(B)

By transposing the terms in Equation 1.42 it is also the case that P(AB) = P(B) P(A/B) . The
latter is often referred to as the multiplication rule, which states the probability that two events
A and B will occur simultaneously is equal to the probability of B times the conditional
probability P(A/B). Since it is also the case that P(B/A) = P(AB)/P(A), it is also true that
P(AB) = P(A)P(B/A) . In other words, we can also say the probability that two events A and B
will occur simultaneously is equal to the probability of A times the conditional probability.

Bayes’ theorem Bayes’ theorem is an equation which allows for the computation of
conditional probabilities.30 As noted earlier, the conditional probability of an event is the
probability the event will occur, given the fact that another event has occurred. Thus, the
conditional probability of an event A, given the fact the another event B has already occurred
(which is represented with the notation P(A/B)), can be determined through use of Equation 1.42.
The latter conditional probability is commonly referred to as a posterior or a posteriori
probability, since the value derived for the conditional probability is computed through use of
one or more probability values that are already known or estimated from preexisting information.
The latter known probabilities are referred to as prior or a priori probabilities.
In order to describe Bayes’ theorem, assume that we have two sets of events. In one set
there are n events to be identified as A l , A2, ..., A n. In the second set there are two events to be
identified as B+ and B-. Bayes’ theorem states the probability that Aj (where 1 < j <n) wiH
occur, given that it is known that B+ has occurred, is determined with Equation 1.43. (To
determine the likelihood that Aj will occur, given that it is known that B - has occurred (i.e.
P(Aj/B-)), the value B - is employed in the numerator and denominator of Equation 1.43 in place
of B+.)
P(B+ /A)P(A)
P(Aj /B+) = — r J (Equation 1.43)
E P{B*IA,)PiA,)
104 Handbook o f Parametric and Nonparametric Statistical Procedures

The most common application of Equation 1.43 involves its simplest form, in which the first
set is comprised of the two events A x and A 2 , and the second set is comprised ofthe two events B +
and B - . In the latter case, the equation for Bayes’ theorem in reference to event A x (i.e.,
P(Al/B+)), becomes Equation 1.44. Equation 1.45 is employed to compute the conditional
probability P(A 2 /B+).

P(B+/Al )P(Al )
P(Al /B+) = (Equation 1.44)
P{B+IAX)P(AX) + P(B+/A2)P(A2)

P(B+/A 2)P(A2)
P(A 2 /B +) = (Equation 1.45)
P(B+IAX)P(AX) + P(B+/A2)P(A2)

Equation 1.46 (from which Equation 1.44 can be algebraically derived — see Hays and
Winkler (1971, pp. 84 - 85)) is another way of expressing the relationship described by Equation
1.44.

P(A. fl B+)
P(A,/B+) = J ^ (Equation 1.46)

To illustrate the use of Bayes’ theorem, let us assume that we wish to compute the
probability of a person being a male (a male will be represented by the notation A {) versus a
female (a female will be represented with the notation A2), given the fact that we know the
person has a specific disease (a person having the disease will be represented by the notation B+,
whereas a person not having the disease will be represented by the notation B - ). In our example
it will be assumed that for the population in question we know the following probabilities: a)
/ ’(Male) = P{AX)= .40; b) / ’(Female) = P(A2)= .60. In other words, the probability of a member
of the population being a male is .40, and the probability of a person being a female is .60. In
addition, we know the following: a) P(B+/A j) = .01 (which indicates the probability a person has
the disease, given the fact the person is a male, is .01); b)P(B+/A2) = .25 (which indicates the
probability a person has the disease, given the fact the person is a female, is .25). Employing
Equations 1.44 and 1.45, we respectively determine the following: a) The probability a person is
a male, given the fact the person has the disease, is .026; b) The probability a person is a female,
given the fact the person has the disease, is .974. Thus, if we know that someone has the disease,
it is extremely likely the person is a female.

P(A, /B+) = -------- H»X - 40) = .026


(.01)(.40) + (,25)(.60)

P(A2 /B +) = -------- (.25)(.60)-------- = 974


2 (.01)(.40) + (.25)(.60)

A more detailed discussion of Bayes’ theorem and its application can be found in Section
IX (the Addendum) of the binomial sign test for a single sample.

Counting rules The next group of principles to be discussed are often referred to as counting
rules, since they allow one to compute/count the number of outcomes/events that can occur
within the context of a specific situation.
Introduction 105

Rule 1: Assume that in a series comprised of n independent trials, on each trial any one of
kf (where / represents the i ,h trial) mutually exclusive outcomes can occur. We will let k {
represent the number of possible outcomes on Trial 1, k2 the number of possible outcomes on
trial 2,... kt the number of possible outcomes on the i‘h trial,..., and &nthe number of possible
outcomes on the n,htrial. If the number of different sequences that can result within a series of
n trials is designated by the letter M, the value of M is computed as follows:
M = {kx){k2) ... (*,) ... (*„).

Table 1.14 36 Possible Sequences Employing Rule 1

Heads, 1, Jack Heads, 2, Jack Heads, 3, Jack Heads, 4, Jack Heads, 5, Jack Heads, 6, Jack
Heads, 1, Queen Heads, 2, Queen Heads, 3, Queen Heads, 4, Queen Heads, 5, Queen Heads, 6, Queen
Heads, 1, King Heads, 2, King Heads, 3, King Heads, 4, King Heads, 5, King Heads, 6, King
Tails, 1, Jack Tails, 2, Jack Tails, 3, Jack Tails, 4, Jack Tails, 5, Jack Tails, 6, Jack
Tails, 1, Queen Tails, 2, Queen Tails, 3, Queen Tails, 4, Queen Tails, 5, Queen Tails, 6, Queen
Tails, 1, King Tails, 2, King Tails, 3, King Tails, 4, King Tails, 5, King Tails, 6, King

To illustrate Rule 1, assume that a three trial series is conducted employing a coin, a single
six-sided die, and a set o f three playing cards comprised of the Jack, King, and Queen of Hearts.
On Trial 1 the coin is flipped, on Trial 2 the die is rolled, and on Trial 3 a card is randomly
selected from the set of three cards. In our example the number of trials is n = 3. The value of
k { = 2, since there are two possible outcomes for the coin (Head versus Tails). The value of
k2 = 6, since there are six possible outcomes for the die (1 ,2 ,3 ,4 ,5 ,6 ). The value of k3 = 3,
since there are three possible outcomes for the playing cards (Jack, Queen, King). Employing
Rule 1, the number of different sequences that can be obtained in the three trial series is
M = (A^X^X^) = (2)(6)(3) = 36. The 36 possible sequences that can be obtained are
summarized in Table 1.14.
A special case of Rule 1 is when the number of mutually exclusive outcomes that can occur
on each trial is a fixed value (i.e., kl = k2 = ... = 1. = ... = kn). When the latter is true, we can
employ the notation k to represent the number of possible outcomes on each trial, and use the
following equation to compute the value of M: M = k n.
To illustrate this special case of Rule 1, assume that a fair coin is tossed on three trials. In
such a case, there are n = 3 trials, and on each trial there are k = 2 mutually exclusive outcomes
that are possible (Heads and Tails). Employing the equation M = k ", there are 23 = 8 possible
sequences. Specifically, the 8 sequences involving Heads and Tails that can occur are: HHH,
HHT, HTH, HTT, TTT, TTH, THT, THH.
As a second example, assume that a single six-sided die is rolled on 4 trials. In such a case,
there are n = 4 trials, and on each trial there are k = 6 mutually exclusive outcomes that are
possible (1, 2, 3, 4, 5, 6). Employing the equation M = k n, there are 64 = 1296 possible
sequences. In other words, within the context of a four trial series there are 1296 possible
orderings involving the face values 1 through 6 (e.g., I l l I , ..., 1234,..., 1235,..., 6543,..., 6666).
Rule 2: Assume that we have k distinct objects. The number of ways that the k objects may
be arranged in a different order is k\.
Before examining this rule in greater detail the reader should take note of the following: a)
An ordered arrangement is referred to as a permutation; b) The notation k\ represents k
factorial, which is computed as follows: &! = (k)(k ­ 1) ... (1). (A more detailed discussion
of k\ can be found in Endnote 13 of the chi-square goodness-of-fit test.)
106 Handbook o f Parametric and Nonparametric Statistical Procedures

To illustrate Rule 2, assume there are three different colored flags (Red, Blue and Yellow)
lined up from left to right in front of a building. How many ways can the three flags be arranged?
In this example we have k = 3 distinct objects (i.e., the three flags). Thus, there are k\ = (3)(2)( 1)
= 6 ways (or permutations) in which the three objects/flags can be ordered. Specifically: 1):
Red, Blue, Yellow; 2) Red, Yellow, Blue; 3) Blue, Red, Yellow; 4) Blue, Yellow, Red; 5)
Yellow, Red, Blue; 6) Yellow, Blue, Red.
Rule 2 can be employed to answer the question of how many different seating arrangements
there are for a class comprised of 20 students. If one imagines that instead of arranging three
flags in different orders, one is arranging the 20 students in different orders, it is easy to see that
the number of possible arrangements equals k\ Thus, the number of possible arrangements is
k\ = 20! = (20(19) ... (1) = (2.432902008)(10)18, which is an extraordinarily large number.
Rule 3: Assume that we have n distinct objects. If we wish to select and arrange x objects
from our pool o f n objects (where x < n), the number of possible ways that we can do this (which
we will designate as M) is computed with Equation 1.47.

M = — —— (Equation 1.47)
(n - x)\
The value computed with Equation 1.47 can also be represented by the notation P ” , which
represents the number of permutations of n things taken x at a time. To illustrate Rule 3,
assume that a teacher has four children in her class but only has two seats in the room. How many
possible ways are there for the teacher to select two of the four students and arrange them in the
two seats? In our example there are a total of n = 4 objects/students and we wish to select and
arrange x = 2 of the objects/students. Thus, employing Equation 1.47 we compute that there are
M = 4! / (4 - 2)! = 12 arrangements possible. To confirm the latter, we will identify the four
students as A, B, C, and D, and assign two of them to seats with the first student listed in each
sequence assigned to Seat 1 and the second student to Seat 2. The 12 possible arrangements are
as follows: AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC.
As another example to illustrate Rule 3, consider the following. The judges of a beauty
contest have selected four semifinalists. However, the judges will only be awarding a first and
second place prize. How many possible ways are there for the judges to select a first and second
place finisher among the four semifinalists? This example is identical to the previous one, since
there are n = 4 objects/contestants which the judges must select and arrange in x = 2 positions.
Thus, we are once again required to compute the number of permutations of 4 things taken 2 at
a time. Consequently, there are 12 possible arrangements that can be employed for two of the
four semifinalists who are awarded the first and second prize.
Rule 4: Assume that we have n distinct objects. If we wish to select x objects from our pool
of n objects (where x < n), but are not concerned with the order in which they are arranged, the
number o f ways that we can do this (which we will designate as M) is computed with Equation
1.48.
n\
M = (Equation 1.48)
x! (n - x)!

Equation 1.48 is employed to compute the number of combinations of n things taken x


at a time, which can be written with the notation | ”j . Note that in computing combinations
(unlike in computing permutations), one is not interested in the ordering of the elements within
an arrangement.
To illustrate Rule 4, assume that a teacher has four children in her class and must select two
students to represent the class on the student council. How many ways can the teacher select two
Introduction 107

students from the class o f four students? In this example there are n = 4 objects/students, and the
teacher must select x = 2 of them. The teacher is not concerned with placing them in order (such
as would be the case if one student was to be designated the head representative and the other the
assistant representative). Since order is of no concern, through use of Equation 1.48 the value M
—6 is computed.
4!
M = = 6
2! (4 - 2)!

Thus, there are six ways two students can be selected. Specifically, if we designate the four
students A, B, C, and D, the six ways are as follows: A and B, A and C, A and D, B and C, B
and D, C and D.
As another example to illustrate Rule 4, consider the following. The judges of a beauty
contest have selected four semifinalists. They will now select two finalists, but will not designate
either selectee as a first or second place winner. How many possible ways are there for the judges
to select two finalists among the four semifinalists, without specifying any order of finish? This
example is identical to the previous one, since there are n = 4 objects/contestants, and the judges
must select x = 2 o f them. Thus, as in the previous example, there are 6 ways the judges can
select the two finalists. Note that in both of the examples which are employed to illustrate Rule
4, the computed value of M is half the number obtained for the examples used to illustrate Rule
3 (which also employed the values n = 4 and x = 2). The reason for this is that Rule 3 takes the
order of an arrangement into account. Thus, if the order of the elements is considered for the six
arrangements identified with Rule 4, each of them can be broken down into two ordered
arrangements/permutations. As an example, the A and B arrangement computed for Rule 4 with
Equation 1.48 can be broken down into the two permutations AB and BA.
A commonly asked question that Rule 4 can be employed to answer is the number of
different hands which are possible in a game of cards. Specifically, in determining the number
of different hands that are possible in a game of poker, one is asking how many ways x = 5 cards
can be selected from a deck of n = 52 cards. Rule 4 is employed, since one is not concerned with
the order in which the cards are dealt. Thus, the number of possible hands is computed with
Equation 1.48 as follows: | 52j - 52^ = 2,598,960.

Rule 5: Assume that we have N distinct objects which we wish to divide into k mutually
exclusive subsets, with objects in subset 1, n2 objects in subset 2,... nt objects in subset i,
..., and nk objects in subset k. The number of possible ways (which we will designate M) that the
N objects can be divided into k subsets is computed with Equation 1.49.

M = -----------—---------------------- (Equation1.49)
nx\n 2\ ... n}. ... nk\

To illustrate Rule 5, assume that the owner of a restaurant has five applicants A, B, C, D,
and E, and wants to select two applicants to be cooks, two to be bakers, and one to be the
manager. Assuming all of the applicants are equally qualified for each of the positions, how many
different ways can the five applicants be divided into two cooks, two bakers, and one manager?
To be more precise, the question being asked is how many different ways can the N = 5 applicants
be divided into k = 3 subsets - specifically, a subset of n{ = 2 cooks, a subset of n2 = 2 bakers,
and a subset of n3 = 1manager. Employing Equation 1.49 the value M = 30 is computed.
5 i
M = — ± l — = 30
2 ! 2 ! 1!
108 Handbook o f Parametric and Nonparametric Statistical Procedures

The 30 ways that the applicants can be assigned to the three subsets are summarized in Table
1.15. In the latter table, the first two letters listed represent the applicants who are given the job
of cook, the second two letters represent the applicants who are given the job of baker, and the
last letter represents the applicant who is given the job of manager.
As another example to illustrate Rule 5, consider the following. Assume that a teacher has
10 children in her class and wants to select two children to be student council representatives,
three children to be representatives on the school athletic council, and one child to be the class
library representative. How many different ways can the 10 students be divided into two student
council representatives, three athletic council representatives, and one library representative? To
be more precise, the question being asked is how many different ways can the N = 10 students be
divided into k = 4 subsets. Specifically, the four subsets are a subset of n { = 2 student council
representatives, a subset of n 2 = 3 athletic council representatives, a subset of n3 = 1 library

Table 1.15 Summary of 30 Ways Applicants Can be Assigned to Three Subsets

Arrangement Cook Baker Manager

1 AB CD E
2 AB CE D
3 AB DE C
4 AC DE B
5 AC BE D
6 AC DB E
7 AD BE C
8 AD BC E
9 AD EC B
10 AE BD C
11 AE BC D
12 AE DC B
13 BC AE D
14 BC AD E
15 BC DE A
16 BD AE C
17 BD AC E
18 BD CE A
19 BE AD C
20 BE AC D
21 BE CD A
22 CD AB E
23 CD AE B
24 CD EB A
25 CE AB D
26 CE DA B
27 CE BD A
28 DE AB C
29 DE BC A
30 DE CA B
Introduction 109

representative, and a subset of n4 = 4 students who are not selected to be any kind of
representative.
Employing Equation 1.49 the value M = 12,600 is computed.

10'
M = ---- — — = 12,600
2! 3! 1! 4!
In point o f fact, Rule 5 is a general rule that can be employed for any value of k, and when
k = 2 subsets Equation 1.49 reduces to Equation 1.48, the equation for Rule 4. Consequently, Rule
4 is a special case of Rule 5. In the examples employed to illustrate Rule 4 (in which a teacher
must select two out of four children or the judges of a beauty contest must select two out of four
contestants) there are two subsets. Specifically, in each example there is the subset of n x - 2
individuals who are selected and the subset of n 2 - 2 individuals who are not selected.

Parametric versus Nonparametric Inferential Statistical Tests

The inferential statistical procedures discussed in this book have been categorized as being
parametric versus nonparametric tests. Some sources distinguish between parametric and
nonparametric tests on the basis that parametric tests make specific assumptions with regard
to one or more of the population parameters that characterize the underlying distribution(s) for
which the test is employed. These same sources describe nonparametric tests (also referred to
in some sources as distribution-free or assumption-free tests) as making no such assumptions
about population parameters. In truth, nonparametric tests are really not assumption free, and, in
view of this, Marascuilo and McSweeney (1977) suggest that it might be more appropriate to
employ the term "assumption freer" rather than nonparametric in relation to such tests.
The distinction employed in this book for categorizing a procedure as a parametric versus
a nonparametric test is primarily based on the level of measurement represented by the data that
are being analyzed. As a general rule, inferential statistical tests which evaluate categorical/
nominal data and ordinal/rank-order data are categorized as nonparametric tests, while tests
that evaluate interval or ratio data are categorized as parametric tests. Although the
appropriateness of employing level of measurement as a criterion in this context has been
debated, in most instances (although there are exceptions) its usage provides a reasonably simple
and straightforward schema for categorization which facilitates the decision-making process for
selecting an appropriate statistical test.
There is general agreement among most researchers that as long as there is no reason to
believe that one or more of the assumptions of a parametric test have been violated, when the
level of measurement for a set of data is interval or ratio, the data should be evaluated with the
appropriate parametric test. However, if one or more of the assumptions of a parametric test are
saliently violated, some (but not all) sources believe it may be prudent to transform the data into
a format which makes it compatible for analysis with the appropriate nonparametric test.31
Related to this is that even though parametric tests generally provide a more powerful test of an
alternative hypothesis than their nonparametric analogs, the power advantage of a parametric test
may be negated if one or more of its assumptions are violated.
The reluctance among some sources to transform interval/ratio data into an ordinal/rank-
order or categorical/nominal format for the purpose of analyzing it with a nonparametric test is
based on the fact that interval/ratio data contain more information than either of the latter two
forms of data.32 Because of their reluctance to sacrifice information, these sources take the
position that even when there is reason to believe that one or more of the assumptions of a
parametric test have been violated, it is still more prudent to employ the appropriate parametric
test. Such sources argue that most parametric statistical tests are robust. A robust test is one
110 Handbook o f Parametric and Nonparametric Statistical Procedures

which can still provide reasonably reliable information with regard to the underlying population,
even if certain of the assumptions underlying the test are violated. Generally, when a parametric
test is employed under the latter conditions, certain adjustments are made in evaluating the test
statistic in order to improve its reliability.
In the final analysis, in most instances, the debate concerning whether a researcher should
employ a parametric or nonparametric test for a specific experimental design turns out to be of
little consequence. The reason for this is that most of the time a parametric test and its
nonparametric analog are employed to evaluate the same set o f data, they lead to identical or
similar conclusions. This latter observation is demonstrated throughout this book with numerous
examples. In those instances where the two types of test yield conflicting results, the truth can
best be determined by conducting multiple experiments which evaluate the hypothesis under
study.33 A detailed discussion of statistical methods which can be employed for pooling the
results of multiple studies that evaluate the same general hypothesis can be found in Test 43 on
meta-analysis.

Univariate versus Bivariate versus Multivariate Statistical Procedures

The term univariate statistical analysis is generally employed for descriptive and
inferential statistical procedures which evaluate a single variable (e.g., a hypothesis about a
population mean, a population variance, etc.). The term bivariate statistical analysis is generally
employed for procedures which allow a researcher to investigate the relationship between two
variables — specifically, an independent/predictor variable and a dependent/ criterion variable
(although some sources limit the use of the term bivariate analysis to simple correlational analysis
involving pairs of observations (i.e., scores on two variables) for a sample of subjects).
Grimm and Yarnold (1995, p. 4) note that although researchers are not in complete
agreement with respect to the use of the term multivariate statistics, it is generally employed
for procedures which simultaneously evaluate multiple (i.e., two or more) independent/
predictor variables and multiple (i.e., two or more) dependent/criterion variables. Some
sources, however, reserve the use of the term multivariate to primarily identify procedures
which involve two or more dependent variables. It should be noted, however, that discriminant
function analysis (Test 37) and logistic regression (Test 39) (both of which involve one
dependent variable) are among the procedures which are commonly identified as multivariate.
Multivariate statistical procedures (which are discussed in detail in Tests 33^12) afford
researchers with a number of advantages over univariate and bivariate procedures.
Specifically, among others, Harlow (2005, Ch. 1) notes the following: a) Because multivariate
procedures are able to simultaneously assess the role of multiple variables, they represent a more
realistic methodology for evaluating real world phenomena and theoretical models, both of which
are typically complex and involve multiple variables; b) Because they evaluate multiple variables,
multivariate procedures can minimize the amount of unexplained variability which results from
random error; c) Because it takes into account interrelationships/intercorrelations between
variables, multivariate statistics are better able to rule out the role of extraneous variables; d)
Multivariate procedures allow a researcher to control the overall Type I error rate, which
otherwise would be inflated if multiple bivariate analyses were conducted on the same set of
variables; and e) Within the context of multivariate analysis, a researcher can conduct an overall
analysis involving all of the variables (a macro-analysis) as well as more specific analyses (micro­
analyses) assessing the role of the individual variables.
Some disadvantages associated with the use of multivariate procedures are: a) The
mathematical operations involved in implementing multivariate procedures are considerably more
complex than those required for univariate and bivariate procedures, and, for the most part,
References

Altman, D. G., Machin, D., Bryant, T. N . , & Gardner, M . J. (2002). Statistics with confidence: Confidence
intervals with statistical guidelines (2nd ed.). London: British Medical Journal Books.
Anderson, D. R., Sweeney, D. J., & Williams, T. A. (2002). Statistics for business and economics (8th ed.).
Cincinnati: South-Western/Thomson Learning.
Anderson, N . N . (2001). Empirical direction in design and analysis. Mahwah, N.J.: Lawrence Erlbaum
Associates.
Anscombe, F. J. & Glynn, W. W. (1983). Distributions of the kurtosis statistic. Biometrika, 70, 227a€"234.
Arlinghaus, S. L . & Griffith, D. A . (Eds.) (1996). Practical handbook of spatial statistics. Boca Raton, FL: CRC
Press.
Baltes, P. B., Reese, H. W., & Nesselroade, J. R. (1977). Life-span developmental psychology: Introduction to
research. Monterey: CA: Wadsworth Publishing Company.
Batschelet, E. (1981). Circular statistics in biology. New York: Academic Press.
Bechtold, B. & Johnson, R. (1989). Statistics for business and economics. Boston: PWS-Kent.
Bowley, A . L. (1920). Elements of statistics (4th ed.). New York: Charles Scribnera€™s Sons.
Campbell, D. T. & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Skokie,
IL: Rand-McNally.
Chou, Y. (1989). Statistical analysis for business and economics. New York: Elsevier.
Christensen, L.B. (1997). Experimental methodology (7th ed.). Boston: Allyn & Bacon.
Christensen, L.B. (2000). Experimental methodology (8th ed.). Boston: Allyn & Bacon.
Clark-Carter, D. (2005). Stanine scores. In Everitt, B. S. and Howell, D. C. (Eds.). Encyclopedia of statistics in
behavioral science (pp. 461a€"465). Chichester, UK: John Wiley & Sons.
Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth Advanced Books &
Software.
Cohen, B. (1996). Explaining psychological statistics. Pacific Grove, CA: Brooks/Cole Publishing Company.
Cohen, B. (2001). Explaining psychological statistics (2nd ed.). New York: John Wiley & Sons.
Cohen, J. (1962). The statistical power o f abnormal-social psychological research: A review. Journal of
Abnormal and Social Psychology, 65, 145a€"153.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cook, T. D. & Campbell. D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings.
Boston: Houghton-Mifflin Company.
Cornell University, Office o f Statistical Consulting, StatNews, 54, November 5, 2002.
Cowles, M . (1989). Statistics in psychology: A n historical perspective. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Cowles, M . (2001). Statistics in psychology: An historical perspective (2nd ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Da€™Agostino, R. B. (1970). Transformation to normality of the null distribution of Biometrika, 57,
679a€"681.
Da€™Agostino, R. B. (1986). Tests for the normal distribution. In Da€™Agostino, R. B. and Stephens, M . A .
(Eds.), Goodness-of-fit techniques (pp. 367a€"119). New York: Marcel Dekker.
Da€™Agostino, R. B., Belanger, A., & Da€™Agostino Jr., R. B. (1990). A suggestion for using powerful and
informative tests of normality. American Statistician, 44, 316a€"321.
Daniel, W.W. and Terrell, J. C. (1995). Business statistics for management and economics (7th ed.). Boston:
Houghton Mifflin.
Dukes, W. F. (1965). N = 1. Psychological Bulletin, 1965, 64, 74a€"79.
Everitt, B. S. (2001). Statistics for psychologists: An intermediate course. Mahwah, N.J.: Lawrence Erlbaum
Associates.
Fisher, N . I . (1993). Statistical analysis of circular data. Cambridge, UK: Cambridge University Press.
Fisher, N . I . , Lewis, T., & Embleton, B. J. (1987). Statistical analysis of spherical data. Cambridge, UK:
Cambridge University Press.
Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver & Boyd.
Fisher, R. A. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society, Series
B, 17, 69a€"78.
Fisher, R. A. (1956). Statistical methods and scientific inference. Edinburgh: Oliver & Boyd.
Folz, D. H. (1996). Survey research for public administration. Thousand Oaks, CA: Sage Publications.
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In Keren, G. & Lewis, C. (Eds.),
A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311a€"339).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Gigerenzer, G. & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Grimm, L . G. & Yarnold, P. R. (1995). Introduction to multivariate statistics. In Grimm, L . G. & Arnold, P. R.
(Eds.), Reading and understanding multivariate statistics (pp. 1a€"18). Washington, DC: American
Psychological Association.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ:
Lawrence Erlbaum Associates.
Gurland, J. & Tripathi, R. C. (1971). A simple approximation for unbiased estimation o f the standard deviation.
American Statistician, 25(4), 30a€"32.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Harlow, L . L . , Muliak, S. A., & Steiger, J. H. (Eds.) (1997). What if there were no significance tests. Mahwah:
NJ: Lawrence Erlbaum Associates.
Hays, W. L. & Winkler, R. L (1971). Statistics: Probability, inference, and decision. New York: Holt, Rinehart
& Winston.
Hersen, M . & Barlow, D. H. (1976). Single-case experimental designs: Strategies for studying behavior
change. New York: Pergamon Press.
Hoaglin, D. C., Mosteller, F. & Tukey, J. W. (Eds.) (1983). Understanding robust and exploratory data
analysis. New York: John Wiley & Sons.
Hoffman, P. (1998). The man who loved only numbers. New York: Hyperion.
Hogg, R. V . & Tanis, E. A . (1997). Probability and statistical inferences (5th ed.). Saddle River, NJ: Prentice
Hall.
Howell, D. C. (1997). Statistical methods for psychology (4th ed.). Belmont, CA: Duxbury Press.
Hunter, J. E. & Schmidt, F. L. (1990). Methods in research findings (1st ed.). Newbury Park, CA: Sage
Publications.
Hunter, J. E. & Schmidt, F. L . (2004). Methods of meta-analysis - Correcting error and bias in research
findings (2nd ed.). Thousand Oaks, CA: Sage Publications.
Jammalamadaka, S. R. & SenGuputa, A . (2001). Topics in circular statistics. River Edge, NJ: World Scientific
Publishing Company.
Johnson, N . L . & Kotz, S. (Eds.) (1997). Leading personalities in statistical sciences. New York: John Wiley &
Sons, Inc.
Kachigan S. K. (1986). Statistical analysis. New York: Radius Press.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, DC: American Psychological Association.
Larsen, R. J. & Marx, M . L . (1985). An introduction to probability and its applications. Englewood Cliffs, NJ:
Prentice Hall.
Lehman, E. L . (1993). The Fisher, Neyman-Pearson theories o f testing hypotheses: One theory or two? Journal of
the American Statistical Association, 88, 1242a€"1249.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Mardia, K. V. (1972). Statistics of directional data. New York: Academic Press.
McElory, E. E. (1979). Applied business statistics. San Francisco: Holden-Day, Inc.
Meehl, P. E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of
Science, 34, 103a€"115.
Moore, M . (Ed.) (2001). Spatial statistics: Methodological aspects and applications. New York: Springer.
Moors, J. J. (1986). The meaning of kurtosis: Darlington revisited. American Statistician, 40, 283a€"284.
Moors, J. J. (1988). A quantile alternative for kurtosis. Statistician, 37, 25a€"52.
Mosteller, F. & Tukey, J. W. (1977). Data analysis and regression: A second course in statistics. Reading, MA:
Addison-Wesley.
Neyman, J. (1941). Fiducial argument and the theory of confidence intervals. Biometrika, 32, 128a€"150.
Neyman, J. (1950). First course in probability and statistics. New York: Henry Holt & Company.
Neyman, J. & Pearson, E. S. (1928). On the use and interpretation o f certain test criteria. Biometrika, 20,
175a€"240.
Oakes, M . (1986). Statistical inference: A commentary for the behavioral and social sciences. Chichester:
John Wiley & Sons.
Oa€™Sullivan, E., Rassel, G., & Berner, M . (2002). Research methods for public administration (4th ed.).
Boston: Longman.
Pearson, E. S. (1962). Some thoughts on statistical inference. Annals of Mathematical Statistics, 3, 394a€"403.
Ripley, B. D. (2004). Spatial statistics. Hoboken, NJ: Wiley-Interscience.
Rodgers, J. L . (2010). The epistemology o f mathematical and statistical modeling: A quiet methodological
revolution. American Psychologist, 65, 1a€"12.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Rozeboom, W. W. (1960). The fallacy o f the null hypothesis significance test. Psychological Bulletin, 57,
416a€"428.
Scheaffer, R. L . , Mendenhall, W., & Ott, L. (1996). Elementary survey samping (5th ed.). Belmont, CA:
Duxbury Press.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for
generalized causal inference. Boston: Houghton Mifflin Company.
Sheskin, D. J. (1984). Statistical tests and experimental design: A guidebook. New York: Gardner Press.
Smith, A. F. & Prentice, D. A . (1993). Exploratory data analysis. In Keren, G. & Lewis, C. (Eds.), A handbook
for data analysis in the behavioral sciences: Statistical issues (pp. 349a€"390). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Smithson, M . (2003). Confidence intervals. Thousand Oaks, CA: Sage Publications.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Stevens, S.S. (1946). On the theory and scales of measurement. Science, 103, 677a€"680.
Stigler, S. M . (1999). Statistics on the table: The history of statistical concepts and methods. Cambridge, M A :
Harvard University Press.
Tabachnick, B. G. & Fidell, L . S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Tankard, J. Jr. (1984). The statistical pioneers. Cambridge, MA: Schenkman.
Thompson, B. (1993). Statistical significance testing in contemporary practice: Some proposed alternatives with
comments from journal editors (Special issue). Journal of Special Education, 61 (4).
Thompson, B. (1999). Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to
importance. Educational Psychology Review, 11, 157a€"169.
Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for
effect sizes. Educational Researcher, 31, 25a€"32.
Tolman, H . (1971). A simple method for obtaining unbiased estimates o f population standard deviations.
American Statistician, 25(1), 60.
Trochim, W. M . (2005). Research methods: The concise knowledge base. Cincinnati: Atomic Dog Publishing
Company.
Tufte, E. R. (1983). The visual display of quantitative information. Chesire, CT: Graphics Press.
Tukey, J.W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Upton, G J. & Fingleton, B. (1985). Spatial data analysis by example: Point pattern and quantitative data.
Volume 1. New York: John Wiley & Sons.
Upton, G J. & Fingleton, B. (1989). Spatial data analysis by example: Categorical and directional data.
Volume 2. New York: John Wiley & Sons.
Wainer, H . (1999). One cheer for null hypothesis significance testing. Psychological Methods, 4, 212a€"213.
Watson, G. S. (1983). Statistical on spheres. New York: John Wiley & Sons.
Wilcox, R. R. (1987). New statistical procedures for the social sciences. Hillsdale, NJ: Erlbaum.
Wilcox, R. R. (1996). Statistics for the social sciences. San Diego, CA: Academic Press.
Wilcox, R. R. (1997). Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic
Press.
Wilcox, R. R., (2001). Fundamentals of modern statistical methods: Substantially increasing power and
accuracy. New York: Springer.
Wilcox, R. R., (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press.
Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill Publishing
Company.
Wuensch, K. L . (2005). Kurtosis. I n Everitt, B. S. and Howell, D. C. (Eds.). Encyclopedia of statistics in
behavioral science (pp. 1028a€"1029). Chichester, UK: John Wiley & Sons.
Wuensch, K. L . (2005). Skewness. I n Everitt, B. S. and Howell, D. C. (Eds.). Encyclopedia of statistics in
behavioral science (pp. 1855a€"1856). Chichester, UK: John Wiley & Sons.
Wuensch, K. (2011). Statistics lessons. Website: http://core.ecu.edu/psyc/wuenschk/SPSS/Statlessons.htm.
Zar, J. H. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Freund, J. E. (1984). Modern elementary statistics (6th ed.). Englewood Cliffs, NJ: Prentice Hall, Inc.
Altman, D. G., Machin, D., Bryant, T. N . , & Gardner, M . J. (2002). Statistics with confidence: Confidence
intervals with statistical guidelines (2nd ed.). London: British Medical Journal Books.
Anderson, D. R., Sweeney, D. J. & Williams, T. A. (2002). Statistics for business and economics (8th ed.).
Cincinnati: South-Western/Thomson Learning.
Anderson, N . N . (2001). Empirical direction in design and analysis. Mahwah, NJ: Lawrence Erlbaum Associates.
Benneyan, J.C. (1998). Use and interpretation of statistical quality control charts. Journal for Quality in Health
Care, 10, 69a€"73.
Chandra, J. M . (2001). Statistical quality control. Boca Raton, FL: CRC Press.
Chou, Y. (1989). Statistical analysis for business and economics. New York: Elsevier.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Harlow, L . L., Muliak, S. A., & Steiger, J. H . (Eds.) (1997). What if there were no significance tests. Mahwah: NJ:
Lawrence Erlbaum Associates.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury.
Hunter, J. E. & Schmidt, F. L . (2004). Methods of meta-analysis: Correcting error and bias in research findings
(2nd ed.). Thousand Oaks, CA: Sage Publications.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, DC: American Psychological Association.
Meehl, P. E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of Science,
34, 103a€"115.
Rosner, B. (1995). Fundamentals of biostatistics (4th ed.). Belmont, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Rozeboom, W. W. (1960). The fallacy o f the null hypothesis significance test. Psychological Bulletin, 57,
416a€"428.
Shewhart, W.A. (1931). The economic control of quality of manufactured product. New York: Van Nostrand and
Company.
Smithson, M . (2003). Confidence intervals. Thousand Oaks, CA: Sage Publications.
Thompson, B. (1993). Statistical significance testing in contemporary practice: Some proposed alternatives with
comments from journal editors (Special issue). Journal of Special Education, 61 (4).
Thompson, B. (1999). Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to
importance. Educational Psychology Review, 11, 157a€"169.
Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for
effect sizes. Educational Researcher, 31, 25a€"32.
van Belle, G. (2002). Statistical rules of thumb. New York: John Wiley & Sons.
Western Electric Company (1956). Statistical quality control handbook. Indianapolis, IN: A T T Technologies.
Wilkinson, L. & APA Task Force o f Statistical Inference (1999). Statistical methods in psychology journals:
Guidelines and explanations. American Psychologist, 54, 594a€"604.
Zar, J. H. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Crisman, R. (1975). Shortest confidence interval for the standard deviation of a normal distribution. Journal of
Undergraduate Mathematics, 7, 57.
Guenther, W. C. (1965). Concepts of statistical inference. New York: McGraw-Hill Book Company.
Hogg, R. V. & Tanis, E. A. (1988). Probability and statistical inference (3rd ed.). New York: Macmillan
Publishing Company.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA.: Duxbury.
Smithson, M . (2003). Confidence intervals. Thousand Oaks, CA: Sage Publications.
Zar, J. H. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Anscombe, F. J. & Glynn, W. W. (1983). Distributions of the kurtosis statistic. Biometrika, 70, 227a€"234.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Da€™Agostino, R. B. (1970). Transformation to normality of the null distribution of g1. Biometrika, 57,
679a€"681.
Da€™Agostino, R. B. (1986). Tests for the normal distribution. I n Da€™Agostino, R. B. & Stephens, M . A .
(Eds.), Goodness-of-fit techniques (pp. 367a€"419). New York: Marcel Dekker.
Da€™Agostino, R. B. & Stephens, M . A. (Eds.) (1986). Goodness-of-fit techniques. New York: Marcel Dekker.
Da€™Agostino, R. B., Belanger, A., & Da€™Agostino, R. B., Jr. (1990). A suggestion for using powerful and
informative tests of normality. American Statistician, 44, 316a€"321.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Anscombe, F. J. & Glynn, W. W. (1983). Distributions of the kurtosis statistic. Biometrika, 70, 227a€"234.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Da€™Agostino, R. B. (1986). Tests for the normal distribution. I n Da€™Agostino, R. B. & Stephens, M . A.
(Eds.), Goodness-of-fit techniques (pp. 367a€"419). New York: Marcel Dekker.
Da€™Agostino, R. B. & Pearson, E. S. (1973). Tests of departure from normality. Empirical results for the
distribution of b2 and b1. Biometrika, 60, 613a€"622.
Da€™Agostino, R. B. & Stephens, M . A. (Eds.) (1986). Goodness-of-fit techniques. New York: Marcel Dekker.
Da€™Agostino, R. B., Belanger, A., & Da€™Agostino, R. B., Jr. (1990). A suggestion for using powerful and
informative tests of normality. American Statistician, 44, 316a€"321.
Jarque, C. M . & Bera, A . K. (1980). Efficient tests for normality, homoscedasticity, and serial independence o f
regression residuals. Economic Letters, 6, 255a€"259.
Jarque, C. M . & Bera, A . K. (1981). Efficient tests for normality, homoscedasticity, and serial independence o f
regression residuals: Monte Carlo evidence. Economic Letters, 7, 313a€"318.
Jarque, C. M . & Bera, A . K. (1987). A test of normality of observations and regression residuals. International
Statistical Review, 55, 163a€"172.
Hollander, M . & Wolfe, D.A. (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Marascuilo, L . A. & McSweeney, M . (1977). Nonparametric and distribution-free method for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Shapiro, S. S. & Wilk, M . B. (1965). A n analysis o f variance test for normality (complete samples). Biometrika,
52, 591a€"611.
Shapiro, S. S. & Wilk, M . B. (1968). Approximations for the null distribution o f the W statistic. Technometrics,
10, 861a€"866.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Thode, H . C. (2002). Testing for normality. New York: Marcel Dekker.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
Glass, G. V. & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.) Boston: Allyn
& Bacon.
Higgins, J. J. (2004). An introduction to modern nonparametric statistics. Belmont, CA: Duxbury Press.
Marascuilo, L . A. & McSweeney, M . (1977). Nonparametric and distribution-free method for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Pitman, E. J. G. (1948). Lecture notes on nonparametric statistical inference. Columbia University.
Snedecor, G. W. & Cochran, W. G. (1980). Statistical methods (8th ed.). Ames, IA: Iowa State University Press.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80a€"83.
Wilcoxon, F. (1949). Some rapid approximate statistical procedures. Stamford, CT: Stamford Research
Laboratories, American Cyanamid Corporation.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
CramA©r, H . (1928). On the composition of elementary errors. Skandinavisk Aktaurietidskrift, 11, 13a€"74,
141a€"180.
Da€™Agostino, R. B. & Stephens, M . A. (Eds.) (1986). Goodness-of-fit techniques. New York: Marcel Dekker.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
David, F. N . (1950). Two combinatorial tests o f whether a sample has come from a given population. Biometrika,
37, 97a€"110.
Harter, H . L . , Khamis, H . J., & Lamb, R. E. (1984). Modified Kolmogorova€"Smirnov tests for goodness-of-fit.
Communic. Statist. a€" Simula. Computa., 13, 293a€"323.
Hollander, M . & Wolfe, D.A. (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Khamis, H . J. (1990). The I ' corrected Kolmogorova€"Smirnov test for goodness-of-fit. Journal of Statistical
Plan. Infer., 24, 317a€"355.
Khamis, H . J. (2000). The two-stage delta-corrected Kolmogorova€"Smirnov test. Journal of Applied Statistics,
27, 439a€"450.
Kolmogorov, A. N . (1933). Sulla determinazione empirica di una legge di distribuzione. Giorn della€™Inst Ital.
degli. Att., 4, 89a€"91.
Lilliefors, H . W. (1967). On the Kolmogorova€"Smirnov test for normality with mean and variance unknown.
Journal of the American Statistical Association, 62, 399a€"402.
Lilliefors, H . W. (1969). On the Kolmogorova€"Smirnov test for the exponential distribution with mean unknown.
Journal of the American Statistical Association, 64, 387a€"389.
Lilliefors, H . W. (1973). The Kolmogorova€"Smirnov and other distance tests for the gamma distribution
and for the extreme-value distribution when parameters must be estimated. Department o f Statistics,
George Washington University, unpublished manuscript.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Mason, A . L . & Bell, C. B. (1986). New Lilliefors and Srinivasan Tables with applications. Communic. Statis.
a€" Simul., 15(2), 457a€"459.
Massey, F. J., Jr. (1951). The Kolmogorov-Smirnov test for goodness-of-fit. Journal of the American Statistical
Association, 46, 68a€"78.
Miller, L. H . (1956). Table of percentage points of Kolmogorov statistics. Journal of the American Statistical
Association, 51, pp. 111a€"121.
Shapiro, S. S. & Wilk, M . B. (1965). A n analysis o f variance test for normality (complete samples). Biometrika,
52, 591a€"611.
Shapiro, S. S. & Wilk, M . B. (1968). Approximations for the null distribution o f the W statistic. Technometrics,
10, 861a€"866.
Smirnov, N . V. (1936). Sur la distribution de W (criterium de M . R. v. Mises). Comptes Rendus (Paris), 202,
2

449a€"452.
Smirnov, N . V. (1939). Estimate o f deviation between empirical distribution functions in two independent samples
(Russian), Bull Moscow Univ., 2, 3a€"16.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Stevens, J. P. (2002). Applied multivariate statistics for the social sciences (4th ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Thode, H . C. (2002). Testing for normality. New York: Marcel Dekker.
von Mises, R. (1931). Wahrscheinlichkeitsrechnung und ihre anwendung in derstatistik and theoretishen
Physik. Leipzig: Deuticke.
Wilk, H . B., Shapiro, S. S., & Chen, H . J. (1965). A comparative study of various tests of normality. Journal of
the American Statistical Association, 63, 1343a€"1372.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Cochran, W. G. (1952). The chi-square goodness-of-fit test. Annals of Mathematical Statistics, 23, 315a€"345.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
CramA©r, H . (1928). On the composition of elementary errors. Skandinavisk Aktaurietidskrift, 11, 13a€"74,
141a€"180.
Da€™Agostino, R. B. & Stephens, M . A. (Eds.) (1986). Goodness-of-fit techniques. New York: Marcel Dekker.
Dahiya, R. C. & Gurland, J. (1973). How many classes in the Pearson chi-square test? Journal of the American
Statistical Association, 68, 707a€"712.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
David, F. N . (1950). Two combinatorial tests o f whether a sample has come from a given population. Biometrika,
37, 97a€"110.
Everitt, B. S. (1977). The analysis of contingency tables. New York: Chapman & Hall.
Everitt, B. S. (1992). The analysis of contingency tables (2nd ed.). New York: Chapman & Hall.
Feller, W. (1968). An introduction to probability theory and its applications (Volume I ) (3rd ed.). New York:
John Wiley & Sons.
Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: John Wiley & Sons.
Fleiss, J. L . , Levin, B., & Paik, M . C. (2003). Statistical methods for rates and proportions (3rd ed.). New
York: John Wiley & Sons.
Folz, D. H. (1996). Survey research for public administration. Thousand Oaks, CA: Sage Publications.
Ghosh, B. K. (1979). A comparison o f some approximate confidence intervals for the binomial parameter.
Journal of the American Statistical Association, 74, 894a€"900.
Glass, G. V. & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.) Boston: Allyn
& Bacon.
Haberman, S. J. (1973). The analysis of residuals in cross-classified tables. Biometrics, 29, 205a€"220.
Howell, D. C. (1992). Statistical methods for psychology (3rd ed.). Boston: PWSa€"Kent Publishing Company.
Keppel, G. & Saufley, W. H., Jr. (1992). Introduction to design and analysis: A studenta€™s handbook (2nd
ed.). New York: W. H . Freeman & Company.
Kolmogorov, A. N . (1933). Sulla determinazione empirica di una legge di distribuzione. Giorn della€™Inst. Ital.
degli. Att., 4, 89a€"91.
Lilliefors, H . W. (1967). On the Kolmogorova€"Smirnov test for normality with mean and variance unknown.
Journal of the American Statistical Association, 62, 399a€"402.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Miller, I . & Miller, M . (1999). John E . Freunda€™s mathematical statistics (6th ed.). Upper Saddle River, NJ:
Prentice Hall.
Oa€™Sullivan, E, Rassel, G., & Berner, M . (2002). Research methods for public administration (4th ed.).
Boston: Longman.
Rosner, B. (1995). Fundamentals of biostatistics (4th ed.). Belmont, CA: Duxbury Press.
Scheaffer, R. L . , Mendenhall, W., & Ott, L. (1996). Elementary survey sampling (5th ed.). Belmont, CA:
Duxbury Press.
Shapiro, S. S. & Wilk, M . B. (1965). A n analysis o f variance test for normality (complete samples). Biometrika,
52, 591a€"611.
Shapiro, S. S. & Wilk, M . B. (1968). Approximations for the null distribution o f the W statistic. Technometrics,
10, 861a€"866.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGrawa€"Hill Book Company.
Smirnov, N . V. (1936). Sur la distribution de W (criterium de M . R. v. Mises). Comptes Rendus (Paris), 202,
2

449a€"452.
Smithson, M . (2003). Confidence intervals. Thousand Oaks, CA: Sage Publications.
Stirling, J. (1730). Methodus differentialis. Thode, H . C. (2002). Testing for normality. New York: Marcel
Dekker.
von Mises, R. (1931). Wahrscheinlichkeitsrechnung und ihre anwendung in derstatistik and theoretishen
Physik. Leipzig: Deuticke.
Wallis, W. A. & Roberts, H. V. (1956). Statistics: A new approach. Glencoe, IL: Free Press.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Alcock, J. E. (1981). Parapsychology: Science or magic? A psychological perspective. Oxford: Pergamon
Press.
Anscombe, F. J. (1948). The transformation o f Poisson, binomial, and negative binomial data. Biometrika, 35,
246a€"254.
Berry, D. A. (1996). Statistics: A Bayesian perspective. Belmont, CA: Duxbury Press.
Best, D. J. (1975). The difference between two Poisson expectations. Australian Journal of Statistics, 17,
29a€"33.
Broughton, R. S. (1991). Parapsychology: The controversial science. New York: Ballantine Books.
Burdick, D. S. & Kelly, E. F. (1977). Statistical methods in parapsychological research. I n Wolman, B. B. (Ed.),
Handbook of parapsychology (pp. 81a€"129). New York: Van Nostrand & Reinhold.
Canavos, G. C. & Miller, D. M . (1995). Modern business statistics. Belmont, CA: Duxbury.
Chou, Y. (1989). Statistical analysis for business and economics. New York: Elsevier.
Christensen, R. (1990). Log-linear models. New York: Springer-Verlag.
Christensen, R. (1997). Log-linear models and logistic regression (2nd ed.). New York: Springer-Verlag.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997a€"1003.
Cowles, M . (1989). Statistics in psychology: A n historical perspective. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Daniel, W. W. & Terrell, J. C. (1995). Business statistics for management and economics (7th ed.). Boston:
Houghton Mifflin Company.
Detre J. and White, C. (1970). The comparison o f two Poisson distributed observations. Biometrics, 26,
851a€"854.
Edwards W., Lindman H., & Savage. L . J. (1963). Bayesian statistical inference for psychological research.
Psychological Review, 70, 193a€"242.
Falk, R. W. & Greenbaum, C. W. (1995). Significance tests die hard. Theory and Psychology, 5, 75a€"98.
Feller, W. (1968). An introduction to probability theory and its applications (Volume I ) (3rd ed.). New York:
John Wiley & Sons.
Fleiss, J. L . , Levin, B., & Paik, M . C. (2003). Statistical methods for rates and proportions (3rd ed.). New
York: John Wiley & Sons.
Freund, J. E. (1984). Modern elementary statistics (6th ed.). Englewood Cliffs, NJ: Prentice Hall.
Gelman, A., Carlin, J. B., Stern, H . S., & Rubin, D. R. (2004). Bayesian data analysis (2nd ed.). Boca Raton, FL:
Chapman & Hall.
Gigerenzer, G. (1993). The superego, the ego and the id in statistical reasoning. In Keren, G. & Lewis, C. (Eds.),
A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311a€"339).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Guenther, W. C. (1968). Concepts of probability. New York: McGraw-Hill Book Company.
Hagen, R. L . (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15a€"24.
Hansel, C. E. M . (1989). The search for psychic power: E S P and parapsychology revisited. Buffalo:
Prometheus Books.
Harlow, L . L., Mulaik, S. A., & Steiger, J. H . (Eds.) (1997). What if there were no significance tests? Mahwah,
NJ: Lawrence Erlbaum Associates.
Hays, W. L. & Winker, R. L . (1971). Statistics: Probability, inference, and decision. New York: Holt, Rinehart,
& Winston.
Hines, T. (2002). Pseudoscience and the paranormal: A critical examination of the evidence (2nd ed.).
Buffalo: Prometheus Books.
Hogg, R. V. & Tanis, E. A . (1997). Probability and statistical inference (5th ed.). Upper Saddle River, NJ:
Prentice Hall.
Irwin, H . J. (1999). An introduction of parapsychology (3rd ed.). Jefferson, NC: McFarland & Company.
Krueger, J. (2001). Null hypothesis significance testing. American Psychologist, 56, 16a€"26.
Larsen, R. J. & Marx, M . L. (1985). An introduction to probability and its applications. Englewood Cliffs, NJ:
Prentice Hall.
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44, 187a€"192.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Meehl, P. & Rosen, A. (1955). Antecedent probability and the efficiency o f psychometric signs, patterns, or
cutting scores. Psychological Bulletin, 52, 194a€"216.
Meehl, P. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress o f soft
psychology. Journal of Consulting and Clinical Psychology, 46, 806a€"834.
Miller, I . & Miller, M . (1999). John E . Freunda€™s mathematical statistics (6th ed.). Upper Saddle River, NJ:
Prentice Hall.
Morrison, D. E. & Henkel, R. E. (Eds.) (1970). The significance test controversy: A reader. Chicago: Aldine.
Olofsson, P. (2007). Probabilities: The little numbers that rule our lives. Hoboken, NJ: John Wiley & Sons.
Pagano, M . & Gauvreau, K. (1993). Principles of biostatistics. Belmont, CA: Duxbury Press.
Parker, R. E. (1979). Introductory statistics for biology (2nd ed.). Cambridge: Cambridge University Press.
Phillips, L. D. (1973). Bayesian statistics for social scientists. New York: Thomas Y. Crowell Company.
Pitman, J. (1993). Probability. New York: Springer-Verlag.
Przyborowski, J. and Wilenski, H . (1940). Homogeneity o f results in testing samples from Poisson series.
Biometrika, 31, 313a€"323.
Ramsey, P. H . & Ramsey, P. P. (1988). Evaluating the normal approximation to the binomial test. Journal of
Educational Statistics, 13, 173a€"182.
Rosner, B. (1995). Fundamentals of biostatistics (4th ed.). Belmont, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Savage, L. J. (1962). The foundations of statistical inference. London: Methuen.
Selvin, S. (1995). Practical biostatistical methods. Belmont, CA: Duxbury.
Serlin, R. A . & Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle.
American Psychologist, 40, 73a€"83.
Serlin, R. A . & Lapsley, D. K. (1993). Rational appraisal o f psychological research and the good-enough
principle. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences:
Methodological issues (pp. 199a€"228). Hillsdale, NJ: Lawrence Erlbaum Associates.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Sinich, T. (1996). Business statistics by example. Upper Saddle River, NJ: Prentice Hall.
van Belle, G. (2002). Statistical rules of thumb. New York: John Wiley & Sons.
Wiggins, J. (1973). Personality and prediction: Principles of personality assessment. Reading, MA: Addison-
Wesley Publishing Company.
Williamson, E. & Bretherton, M . (1963). Tables of the negative binomial probability distribution. New York:
John Wiley & Sons.
Winkler, R. L . (1972). Introduction to Bayesian inference and decision. New York: Holt, Rinehart, & Winston.
Winkler, R. L . (1993). Bayesian statistics: A n overview. I n Keren, G. & Lewis, C. (Eds.), A handbook for data
analysis in the behavioral sciences: Statistical issues (pp. 201a€"232). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Bang, J. W., Schumacker, R. E., & Schlieve, P. L . (1998). Random-number generator validity in simulation
studies: A n investigation of normality. Educational and Psychological Measurement, 58, 430a€"450.
Banks, J. & Carson, J.S. (1984). Discrete-event system simulation. Englewood Cliffs, NJ: Prentice Hall.
Bellinson, H . R., von Neumann, J., Kent, R. H., & Hart, B. I . (1941). The mean square successive difference.
Annals of Mathematical Statistics, 12, 153a€"162.
Bennett, C. A. & Franklin, N . L . (1954). Statistical analysis in chemistry and the chemical industry. New
York: John Wiley & Sons, Inc.
Bennett, D. J. (1998). Randomness. Cambridge, MA: Harvard University Press.
Beyer, W. H . (1968). Handbook of tables for probability and statistics (2nd ed.). Cleveland, OH: The Chemical
Rubber Company.
Blom, G., Holst, L . , Sandell, D. (1994). Problems and snapshots from the world of probability. New York:
Springer-Verlag.
Box, G. E. & Muller, M . E. (1958). A note on the generation o f random normal deviates. Annals of
Mathematical Statistics, 29, 610a€"611.
Chou, Y. (1989). Statistical analysis for business and economics. New York: Elsevier.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Cox, D. R. & Stuart, A . (1955). Some quick tests for trend in location and dispersion. Biometrika, 42, 80a€"95.
Daniel, W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
Dowdy, S. (1986). Statistical experiments using B A S I C . Boston: Duxbury Press.
Durbin, J. & Watson, G. S. (1950). Testing for serial correlation in least squares regression I . Biometrika, 37,
409a€"438.
Durbin, J. & Watson, G. S. (1951). Testing for serial correlation in least squares regression I I . Biometrika, 38,
159a€"178.
Durbin, J. & Watson, G. S. (1971). Testing for serial correlation in least squares regression I I I . Biometrika, 58,
1a€"19.
Emshoff, J. R. & Sisson, R. L. (1970). Design and use of computer simulation models. New York: Macmillan
Publishing Company.
Feller, W. (1968). An introduction to probability theory and its applications (Volume I ) (3rd ed.). New York:
John Wiley & Sons.
Gentle, J. E. (1998). Random number generation and Monte Carlo methods. New York: Springer-Verlag.
Gruenberger, F. & Jaffray, G. (1965). Problems for computer solution. New York: John Wiley & Sons.
Hart, B. I . (1942). Significance levels for the ratio o f the mean square successive difference to the variance.
Annals of Mathematical Statistics, 13, 445a€"447.
Hoel, P. G. & Jessen, R. J. (1982). Basic statistics for business and economics (3rd ed.). New York: John Wiley
& Sons.
Hogg, R. V. & Tanis, E. A. (1988). Probability and statistical inference (3rd ed.). New York: Macmillan
Publishing Company.
James, F. (1990). A review o f pseudorandom number generators. Computer Physics Communications, 60,
329a€"344.
Knuth, D. (1969). Semi-numerical algorithms: The art of computer programming (Vol. 2). Reading, M A :
Addisona€"Wesley.
Knuth, D. (1981). Semi-numerical algorithms: The art of computer programming (Vol. 2) (2nd ed.). Reading,
MA: Addisona€"Wesley.
Knuth, D. (1997). Semi-numerical algorithms: The art of computer programming (Vol. 2) (3rd ed.). Reading,
MA: Addison-Wesley.
Larsen, R. J. & Marx, M . L. (1985). An introduction to probability and its applications. Englewood Cliffs, NJ:
Prentice Hall.
Levene, H . (1952). On the power function o f tests o f randomness based on runs up and down. Annals of
Mathematical Statistics, 23, 34a€"56.
Marsaglia, G. (1985). A current view o f random number generators. In L. Billard (Ed.), Computer science and
statistics: 16 symposium on the interface (pp. 3a€"10). Amsterdam: North-Holland.
th

Marsaglia, G. (1995). The Marsaglia random number generator C D R O M , including the D I E H A R D battery
of tests of randomness. Department o f Statistics, Florida State University, Tallahassee, FL.
Marsaglia, G. & Zaman, A . (1994). Some portable very-long period random number generators. Computers in
Physics, 8, 117a€"121.
Montgomery, D. C. & Peck, E. A . (1992). Introduction to linear regression analysis (2nd ed.). New York: John
Wiley & Sons, Inc.
Mosteller, F. (1965). Fifty challenging problems in probability with solutions. New York: Dover Publications,
Inc.
Netter, J., Wasserman, W., & Kutner, M . H . (1983). Applied linear regression models (3rd ed.). Homewood, IL:
Richard D. Irwin, Inc.
Oa€™Brien, P. C. (1976). A test for randomness. Biometrics, 32, 391a€"401.
Oa€™Brien, P. C. & Dyck, P. J. (1985). A runs test based on runs lengths. Biometrics, 41, 237a€"244.
Olofsson, P. (2007). Probabilities: The little numbers that rule our lives. Hoboken, NJ: John Wiley & Sons.
Peterson, I . (1998). The jungles of randomness. New York: John Wiley & Sons.
Phillips, D. T., Ravindran, A., & Solberg, J. T. (1976). Operations research: Principles and practice. New
York: John Wiley & Sons.
Schmidt, J. W. & Taylor, R. E. (1970). Simulation and analysis of industrial systems. Homewood, IL: Richard
D. Irwin.
Siegel, S. (1956). Nonparametric statistics for the behavioral sciences (1st ed.). New York: McGrawa€"Hill
Book Company.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGrawa€"Hill Book Company.
Stirzaker, D. (1999). Probability and random variables: A beginnera€™s guide. New York: Cambridge
University Press.
Swed, F. S. & Eisenhart, C. (1943). Tables for testing randomness o f grouping in a sequence o f alternatives.
Annals of Mathematical Statistics, 14, 66a€"87.
von Neumann, J. (1941). Distribution o f the ratio o f the mean square successive difference to the variance. Annals
of Mathematical Statistics, 12, 307a€"395.
Wald, A. & Wolfowitz, J. (1940). On a test whether two samples are from the same population. Annals of
Mathematical Statistics, 11, 147a€"162.
Wallis, W. A . & Moore, G. H . (1941). A significance test for time series analysis. Journal of the American
Statistical Association, 36, 401a€"409.
Wallis, W. A. & Roberts, H. V. (1956). Statistics: A new approach. Glencoe, IL: Free Press.
Young, L . C. (1941). Randomness in ordered sequences. Annals of Mathematical Statistics, 12, 293a€"300.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Anderson, N . N . (2001). Empirical direction in design and analysis. Mahwah, NJ: Lawrence Erlbaum
Associates.
Anderson, S. & Hauck. W. W. (1983). A new procedure for testing equivalence in comparative bioavailability and
other clinical trials. Communication in Statistics a€" Theory and Methods, 12, 2263a€"2692.
Armitage, P., Berry, G., & Matthews, J. (2002). Statistical methods in medical research. Malden, M A :
Blackwell Science.
Atherton Skaff, P. J. & Sloan, J. A. (2004). Design and analysis of equivalence clinical trials via the SAS
system. Website: http://www2.sas.com/proceedings/sugi23/Stats/p218.pdf.
Barnett, V. & Lewis, T. (1994). Outliers in statistical data (3rd ed.). Chichester: John Wiley & Sons.
Bartlett, M . S. (1947). The use of transformations. Biometrics, 3, 39a€"52.
Bohj, D. S. (1978). Testing equality of means of correlated variates with missing data on both responses.
Biometrika, 65, 225a€"228.
Chou, Y. (1989). Statistical analysis for business and economics. New York: Elsevier.
Chow, S. C. & Liu, J. P. (2000). Design and analysis of clinical trials (2nd ed.). New York: Wiley-Interscience.
Chow, S. C. & Liu, J. P. (2004). Design and analysis of bioavailability and bioequivalence studies (2nd ed.,
revised and expanded). New York: Dekker.
Cochran, W. G. & Cox, G. M . (1957). Experimental designs (2nd ed.). New York: John Wiley & Sons.
Cohen, B. H . (2001). Explaining psychological statistics (2nd ed.). New York: John Wiley & Sons.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Conover, W. J. & Inman, R. L . (1981). Rank transformation as a bridge between parametric and nonparametric
statistics. The American Statistician, 35, 124a€"129.
Cribbie, R. A., Gruman, J. A., & Arpin-Cribbie, C. A . (2004). Recommendations for applying tests o f
equivalence. Journal of Clinical Psychology, 60, 1a€"10.
Dunnett, C. W. & Gent, M . (1977). Significance testing to establish equivalence between treatments with special
reference to data in the form of 2 A — 2 tables. Biometrics, 33, 593a€"602.
Everitt, B. S. (2002). Cambridge dictionary of statistics. London: Cambridge University Press.
Fleiss, J. L . , Levin, B., & Paik, M . C. (2003). Statistical methods for rates and proportions (3rd ed.). New
York: John Wiley & Sons.
Feinstein, A.R. (2002). Principles of medical statistics. Boca Raton, FL: Chapman & Hall/CRC.
Fisher, R. A., Corbet, A . S., & Williams, C. B. (1943). The relation between the number of species and the number
of individuals in a random sample of an animal population. Journal of Animal Ecology, 12, 42a€"57.
Fisher, R. A . & Yates, F. (1953). Statistical tables for biological, agricultural, and medical research (4th ed.).
Edinburgh: Oliver & Boyd.
Games, P. (1983). Curvilinear transformation of the dependent variable. Psychological Bulletin, 93, 382a€"387.
Games, P. (1984). Data transformations, power, and skew: A rebuttal to Levine and Dunlap. Psychological
Bulletin, 95, 345a€"347.
Glass, G. (1976). Primary, secondary and meta-analysis of research. Educational Research, 5, 3a€"8.
Goldstein, H . & Healy, M . J. R. (1995). The graphical presentation of a collection of means. Journal of the Royal
Statistical Society, 158A, Part 1, 175a€"177.
Good, P. (1994). Permutation tests: A practical guide to resampling methods for testing hypotheses. New
York: Springer.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ:
Lawrence Erlbaum Associates.
Guenther, W. C. (1965). Concepts of statistical inference. New York: McGrawa€"Hill Book Company.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis with readings
(4th ed.). Upper Saddle River, NJ: Prentice Hall, Inc.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1998). Multivariate data analysis (5th ed.). Upper
Saddle River, NJ: Prentice Hall, Inc.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Hartley, H . O. (1940). Testing the homogeneity of a set of variances. Biometrika, 31, 249a€"255.
Hartley, H . O. (1950). The maximum F-ratio as a shortcut test for heterogeneity o f variance. Biometrika, 37,
308a€"312.
Hatch, J. P. (2005). Equivalence trials. In Everitt, B. S. and Howell, D. C. (Eds.). Encyclopedia of statistics in
behavioral science (pp. 546a€"547). Chichester, UK: John Wiley & Sons.
Hedges, L . V. (1981). Distribution theory for Glassa€™s estimator of effect size and related estimators. Journal
of Educational Statistics, 6, 107a€"128.
Hedges, L. V. (1982). Estimation o f effect size from a series o f independent experiments. Psychological Bulletin,
92, 490a€"499.
Hedges, L. V. & Olkin, I . (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Hotelling, H . (1931). The generalization of Studenta€™s ratio. Annals of Mathematical Statistics, 2, 361a€"378.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury.
Huber, P. J. (1981). Robust statistics. New York: John Wiley & Sons.
Hunt, M . (1997). How science takes stock. New York: Russell Sage Foundation.
Jennison, C. & Turnbull, B. W. (2000). Group sequential methods with applications to clinical trials. Boca
Raton, FL: CRC Press.
Kaplan, E. L. & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the
American Statistical Association, 53, 457a€"481.
Keppel, G. (1991). Design and analysis: A researchera€™s handbook (3rd ed.). Englewood Cliffs, NJ: Prentice
Hall.
Keppel, G., Saufley, W. H . Jr., & Tokunaga, H . (1992). Introduction to design and analysis: A studenta€™s
handbook. New York: W. H . Freeman & Company.
Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Belmont, CA:
Brooks/Cole Publishing Company.
Kirk, R. (1995). Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole
Publishing Company.
Kline, R. (2005). Principles and practices of structural equation modeling (2nd ed.). New York: Guilford.
Kruskal, W. H . (1960). Some remarks on wild observations. Technometrics, 2, 1a€"3.
Lee, H . & Fung, K. Y. (1983). Robust procedures for multi-sample location problems with unequal group
variances. Journal of Statistical Computation and Simulation, 18, 125a€"143.
Lesaffre, E. & Verbeke, G. (2005). Clinical trials and intervention studies. I n Everitt, B. S. and Howell, D. C.
(Eds.). Encyclopedia of statistics in behavioral science (pp. 301a€"305). Chichester, UK: John Wiley &
Sons.
Levine, D. W. & Dunlap, W. P. (1982). Power o f the F test with skewed data: Should one transform or not?
Psychological Bulletin, 92, 272a€"280.
Levine, D. W. & Dunlap, W. P. (1983). Data transformation, power, and skew: A rejoinder to Games.
Psychological Bulletin, 93, 596a€"599.
Little, R. J. A . & Rubin, D. R. (1987). Statistical analysis with missing data. New York: John Wiley & Sons.
Little, R. J. A. & Rubin, D. R. (2002). Statistical analysis with missing data. (2nd ed.) Hoboken, NJ: John Wiley
& Sons.
Maxwell, S. E. & Delaney, H . D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Miller, I . & Miller, M . (1999). John E . Freunda€™s mathematical statistics (6th ed.). Upper Saddle River, NJ:
Prentice Hall.
Myers, J. L. & Well, A . D. (1995). Research design and statistical analysis. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Rao, P. V. (1998). Statistical research methods in the life sciences. Pacific Grove, CA: Duxbury Press.
Rocke, D. M . , Downs, G. W., & Rocke, A . J. (1982). Are robust estimators really necessary? Technometrics, 24,
95a€"110.
Rodgers, J. L . , Howard, K. I . , & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between
two experimental groups. Psychological Bulletin, 113, 553a€"565.
Rosenthal, R., Rosnow, R. L . , & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A
correlational approach. Cambridge, UK: Cambridge University Press.
Rosner, B. (1983). Percentage points for a generalized ESD many-outlier procedure. Technometrics, 25(2),
165a€"172.
Rosner, B. (1995). Fundamentals of biostatistics (4th ed.). Belmont, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Rosnow, R. L . , Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation.
Psychological Science, 11, 446a€"453.
Satterthwaite, F. E. (1946). A n approximate distribution o f estimates o f variance components. Biometrics
Bulletin, 2, 110a€"114.
Schenker, N . & Gentleman, J. F. (2001). On judging the significance o f difference by examining the overlap
between confidence intervals means. The American Statistician, 55, 182a€"186.
Schuirmann, D. J. (1987). A comparison o f the two one-sided tests procedure and the power approach for
assessing equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15,
657a€"680.
Seaman, M . A. & Serlin, R. C. (1998). Equivalence confidence intervals for two-group comparisons o f means.
Psychological Methods, 3, 403a€"411.
Shuffler, R. (1988). Maximum z scores and outliers. American Statistician, 42, 79a€"80.
Smith, M . L . , Glass, G. V., and Miller, T. (1980). The benefits of psychotherapy. Baltimore: John Hopkins
University Press.
Smithson, M . (2003). Confidence intervals. Thousand Oaks, CA: Sage Publications.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Sprent, P. (1998). Data driven statistical methods. London: Chapman & Hall.
Stevens, J. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B.G. & Fidell, L. S. (1989). Using multivariate statistics (2nd ed.). New York: HarperCollins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: HarperCollins
College Publishers.
ThA^ni, H . (1967). Transformation of variables used in the analysis of experimental and observational data. A
review. Technical Report Number 7, Statistical Laboratory, Iowa State University, Ames, Iowa, 61 pp.
Tietjen, G. L. (1986). The analysis and detection of outliers. In R. B. Da€™Agostino & M . A . Stephens (Eds.),
Goodness-of-fit techniques. New York: Marcel Dekker, Inc.
Tryon, W. W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential
confidence intervals: A n integrated alternative method o f conducting null hypothesis statistical tests.
Psychological Methods, 6, 371a€"386.
van Belle, G. (2002). Statistical rules of thumb. New York: John Wiley & Sons.
Welch, B. L. (1947). The generalization of Studenta€™s problem when several different population variances are
involved. Biometrika, 34, 28a€"35.
Wellek, S. (2003). Testing statistical hypotheses of equivalence. Boca Raton, FL: Chapman & Hall/CRC.
Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 32, 741a€"744.
Westlake, W. J. (1981). Response to T. B. L . Kirkwood: Bioequivalence testing a€" a need to rethink. Biometrics,
37, 589a€"594.
Westlake, W. J. (1988). Bioavailability and bioequivalence of pharmaceutical formulations. In K. E. Peace (Ed.),
Biopharamaceutical statistics for drug development (pp. 329a€"352). New York: Marcel Dekker.
Wilcox, R. R. (1987). New statistical procedures for the social sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Wilcox, R. R. (1996). Statistics for the social sciences. San Diego, CA: Academic Press.
Wilcox, R. R. (1997). Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic
Press.
Wilcox, R. R., (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press.
Wilcox, R. R. (2005). M estimators of location. In Everitt, B. S. and Howell, D. C. (Eds.), Encyclopedia of
statistics in behavioral science (pp. 1109a€"1110). Chichester, UK: John Wiley & Sons.
Winer, B. J., Brown, D., & Michels, K. (1991). Statistical principles in experimental design (3rd ed.). New
York: McGrawa€"Hill Publishing Company.
Yuen, K. K. (1974). The two sample trimmed t for unequal population variances. Biometrika, 61, 165a€"170.
Zar, J. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Zimmerman, D. W. & Zumbo, B. D. (1993). The relative power o f parametric and non-parametric statistical
methods. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences:
Methodological issues. Hillsdale, NJ: Lawrence Erlbaum Associates.
Bell, C. B. & Doksum, K. A. (1965). Some new distribution-free statistics. Annals of Mathematical Statistics,
36, 203a€"214.
Bergmann, R., Lundbrook, J., & Spooren, W. P. (2000). Different outcomes of the Wilcoxon-Manna€"Whitney
test from different statistical packages. The American Statistician, 54, 72a€"77.
Chemick, M . R. (1999). Bootstrap methods: A practitionera€™s guide. New York: John Wiley & Sons.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Conover, W. J. & Iman, R. L . (1981). Rank transformations as a bridge between parametric and nonparametric
statistics. The American Statistician, 35, 124a€"129.
Cox, D. B. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society,
26b, 103a€"110.
Cox D. B. & Oakes, D. (1984). Analysis of survival data. New York: Chapman & Hall.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
Davison, A. C. & Hinckley, D. V. (1997). Bootstrap methods and their applications. Cambridge, England:
Cambridge University Press.
Delaney, H . D. & Vargha, A . (2002). Comparing several robust tests o f stochastic equality with ordinally scaled
variables and small to moderate sized samples. Psychological Methods, 7, 485a€"503.
Deshpande, J. V., Gore, A. P., & Shanubhogue, A . (1995). Statistical analysis of nonnormal data. New Delhi:
New Age International Publishers Limited/Wiley Eastern Limited.
Desu, M . M . & Raghavarao, D. (2003). Nonparametric statistical methods for complete and censored data.
Boca Raton, FL: Chapman & Hall/CRC.
Devore, J. & Farnum, N . (1999). Applied statistics for engineers and scientists. Pacific Grove, CA: Duxbury
Press.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Mathematical Statistics, 7,
1a€"26.
Efron, B. & Tibshirani R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
Fahoome, G. (2002). Twenty nonparametric statistics and their large-sample approximations. Journal of Modern
Applied Statistical Methods, 2, 248a€"268.
Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver & Boyd.
Fligner, M . A. & Policello, I I , G. E. (1981). Robust rank procedures for the Behrens-Fisher problem. Journal of
the American Statistical Association, 76, 162a€"174.
Gehan, E. A. (1965a). A generalized Wilcoxon test for comparing arbitrarily singly censored samples.
Biometrika, 52, 203a€"223.
Gehan, E. A . (1965b). A generalized two-sample Wilcoxon test for doubly censored data. Biometrika, 52,
650a€"653.
Good, P. (1994). Permutation tests: A practical guide to resampling methods for testing hypotheses. New
York: Springer.
Grimm, L.G. & Yarnold, P.R. (Eds.) (2000). Reading and understanding more multivariate statistics.
Washington, DC: American Psychological Association.
Grissom, R. J. (1994). Probability o f the superior outcome o f one treatment over another. Journal of Applied
Psychology, 79, 314a€"316.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ:
Lawrence Erlbaum Associates.
Higgins, J. J. (2004). An introduction to modern nonparametric statistics. Belmont, CA: Duxbury Press.
Hollander, M . & Wolfe, D.A. (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Huber, P. J. (1981). Robust statistics. New York: John Wiley & Sons.
Jennison, C. & Turnbull, B. (2000). Group sequential methods with applications to clinical trials. Boca Raton,
FL: CRC Press.
Kaplan, E. L. & Meier, P. (1958). Nonparametric estimation for incomplete observations. Journal of the
American Statistical Association, 53, 457a€"481.
Keller-McNulty, S. & Higgins, J. (1987). Effect of tail weight and outliers on power and Type-I error of robust
permutation test for location. Communication in Statistics: Simulations and Computations, 16(1), 17a€"36.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, DC: American Psychological Association.
Kruskal, W. H , (1957). Historical notes on the Wilcoxon unpaired two-sample test. Journal of the American
Statistical Association, 52, 356a€"360.
Kolmogorov, A. N . (1933). Sulla determinazione empiraca di una legge di distribuzione. Giorn della€™Inst.
Ital. degli. Att., 4, 89a€"91.
Lundbrook, J. & Dudley, H . (1998). Why permutation tests are superior to the t and F tests in biomedical research.
The American Statistician, 52, 127a€"132.
Manly, B. F. J. (1997). Randomization, bootstrap and Monte Carlo methods in biology (2nd ed.). London:
Chapman & Hall.
Mann, H . & Whitney, D. (1947). On a test o f whether one o f two random variables is stochastically larger than the
other. Annals of Mathematical Statistics, 18, 50a€"60.
Mantel, N . (1966). Evaluation o f survival data and two new rank order statistics arising in its consideration.
Cancer Chemotherapy Report, 50, 113a€"170.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Maxwell, S. E. & Delaney, H . (1990). Designing experiments and analyzing data. Monterey, CA: Wadsworth
Publishing Company.
Mooney, C. Z. & Duval, D. (1993). Bootstrapping: A nonparametric approach to statistical inference.
Newbery Park, CA: Sage Publications.
Mosteller, F. & Tukey, J. (1977). Data analysis and regression. Reading, MA: Addison-Wesley.
NoruAjis, N . J. (2004). SPSS 13.0 advanced statistical procedures companion. Upper Saddle River, NJ:
Prentice Hall.
Pagano, M . & Gauvreau, K. (1993). Principles of biostatistics. Belmont, CA: Duxbury Press.
Pitman, E. J. G. (1937a). Significance tests that may be applied to samples from any population. Journal of the
Royal Statistical Society: Supplement, 4, 119a€"130.
Pitman, E. J. G. (1937b). Significance tests that may be applied to samples from any population, I I . The
correlation coefficient test. Journal of the Royal Statistical Society: Supplement, 4, 225a€"232.
Pitman, E. J. G. (1938). Significance tests that may be applied to samples from any population, I I I . The analysis o f
variance test. Biometrika, 29, 322a€"335.
Pyke, D. A. & Thompson, H . (1986). Statistical analysis o f survival and removal rate experiments. Ecology, 67,
240a€"245.
Quenouille, M . H . (1949). Approximate tests of correlation in time series. Journal of the Royal Statistical
Society, B, 11, 18a€"84.
Rodgers, J. L . (2005). Jackknife. In Everitt, B. S. and Howell, D. C. (Eds.). Encyclopedia of statistics in
behavioral science (pp. 1005a€"1007). Chichester, UK: John Wiley & Sons.
Rosner, B. (1995). Fundamental of biostatistics (4th ed.). Belmont, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Selvin, S. (1995). Practical biostatistical methods. Belmont, CA: Duxbury.
Sheskin, D. J. (1984). Statistical tests and experimental design: A guidebook. New York: Gardner Press.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGrawa€"Hill Book Company.
Simon, J. L . (1969). Basic research methods in social science. New York: Random House.
Smirnov, N . V. (1939). On the estimation of the discrepancy between empirical curves of distributions for two
independent samples. Bulletin University of Moscow, 2, 3a€"14.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Sprent, P. (1998). Data driven statistical methods. London: Chapman & Hall.
Sprent, P. & Smeeton, N . C. (2000). Applied nonparametric statistical methods (3rd ed.). London: Chapman &
Hall.
Staudte R. G. & Sheather S. J. (1990). Robust estimation and testing. New York: John Wiley & Sons.
Terry, M . E. (1952). Some rank-order tests, which are most powerful against specific parametric alternatives.
Annals of Mathematical Statistics, 23, 346a€"366.
Tukey, J. W. (1958). Bias and confidence in not quite large samples (Abstract). Annals of Mathematical
Statistics, 29, 614.
Tukey, J. W. (1959). A quick, compact, two-sample test to Duckwortha€™s specifications.
van der Waerden, B. L. (1952/1953). Order tests for the two-sample problem and their power. Proceedings
Koninklijke Nederlandse Akademie van Wetenshappen (A), 55 (Indagationes Mathematicae 14),
453a€"458, and 56 (Indagationes Mathematicae, 15), 303a€"316 (corrections appear in Vol. 56, p. 80).
Wald, A. & Wolfowitz, J. (1940). On a test whether two samples are from the same population. Annals of
Mathematical Statistics, 11, 147a€"162.
Wilcox, R. R. (1987). New statistical procedures for the social sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Wilcox, R. R. (1996). Statistics for the social sciences. San Diego, CA: Academic Press.
Wilcox, R. R. (1997). Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic
Press.
Wilcox, R. R. (2001). Fundamentals of modern statistical methods: Substantially increasing power and
accuracy. New York: Springer.
Wilcox, R. R. (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press.
Wilcox, R. R. (2005). Robust estimation and hypothesis testing (2nd ed.) Amsterdam: Elsevier Academic Press.
Wilcoxon, F. (1949). Some rapid approximate statistical procedures. Stamford, CT: Stamford Research
Laboratories, American Cyanamid Corporation.
Wilks, S. S. (1961). A combinatorial test for the problems o f two samples from continuous distributions. In J.
Neyman (Ed.), Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and
Probability. Berkeley & Los Angeles: University of California Press, Vol. I , 707a€"717.
Wright, R. E. (2000). Survival analysis. In Grimm, L . G. & Yarnold, P. R. (Eds.), Reading and understanding
more multivariate statistics (pp. 363a€"407). Washington, DC: American Psychological Association.
Yuen, K. K. (1974). The two-sample trimmed t for unequal population variances. Biometrika, 64, 165a€"179.
Zimmerman, D. W. & Zumbo, B. D. (1993a). The relative power o f parametric and non-parametric statistical
methods. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences:
Methodological issues. Hillsdale, NJ: Lawrence Erlbaum Associates.
Zimmerman, D. W. & Zumbo, B. D. (1993b). Rank transformations and the power o f the Student t test and Welch
2
ta€ test. Canadian Journal of Experimental Psychology, 47, 523a€"529.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
Drion, E. F. (1952). Some distribution-free tests for the difference between two empirical cumulative distribution
functions. Annals of Mathematical Statistics, 23, 563a€"574.
Goodman, L. A . (1954). Kolmogorova€"Smirnov tests for psychological research. Psychological Bulletin, 51,
160a€"168.
Hodges, J. L., Jr. (1958). The significance probability of the Smirnov two-sample test. Ark. Mat., 3, 469a€"486.
Hollander, M . & Wolfe, D. A . (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Kolmogorov, A. N . (1933). Sulla determinazione empirica di una legge di distribuzione. Giorn della€™Inst. Ital.
degli. Att., 4, 89a€"91.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Massey, F. J., Jr. (1952). Distribution tables for the deviation between two sample cumulatives. Annals of
Mathematical Statistics, 23, pp. 435a€"441.
Noether, G. E. (1963). Note on the Kolmogorov statistic in the discrete case. Metrika, 7, 115a€"116.
Noether, G. E. (1967). Elements of nonparametric statistics. New York: John Wiley & Sons.
Quade, D. (1973). The pair chart. Statistica Neerlandica, 27, 29a€"45.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGrawa€"Hill Book Company.
Smirnov, N . V. (1936). Sur la distribution de W (criterium de M . R. v. Mises). Comptes Rendus (Paris), 202,
2

449a€"452.
Smirnov, N . V. (1939). Estimate o f deviation between empirical distribution functions in two independent samples
(Russian). Bull Moscow Univ., 2, 3a€"16.
Sprent, P. (1993). Applied nonparametric statistics (2nd ed). London: Chapman & Hall.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Ansari, A. R. & Bradley, R. A . (1960). Rank-sum tests for dispersions. Annals of Mathematical Statistics, 31,
1174a€"1189.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Conover, W. J. & Iman, R. L. (1978). Some exact tables for the squared ranks test. Communication in Statistics:
Simulation and Computation, B7 (5), 491a€"513.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
Freund, J. E. & Ansari, A . R. (1957). Two-way rank-sum test for variances. Technical Report Number 34,
Virginia Polytechnic Institute and State University, Blacksburg, V A .
Hollander, M . (1963). A nonparametric test for the two-sample problem. Psychometrika, 28, 395a€"403.
Hollander, M . & Wolfe, D. A . (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Klotz, J. (1962). Nonparametric tests for scale. Annals of Mathematical Statistics, 33, 498a€"512.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Mood, A. M . (1954). On the asymptotic efficiency o f certain nonparametric two-sample tests. Annals of
Mathematical Statistics, 25, 514a€"522.
Moses, L. E. (1952). A two-sample test. Psychometrika, 17, 234a€"247.
Moses, L. E. (1963). Rank tests of dispersion. Annals of Mathematical Statistics, 34, 973a€"983.
Sheskin, D. J. (1984). Statistical tests and experimental design. New York: Gardner Press.
Siegel, S. (1956). Nonparametric statistics. New York: McGraw H i l l Book Company.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGrawa€"Hill Book Company.
Siegel, S. & Tukey, J. W. (1960). A nonparametric sum o f ranks procedure for relative spread in unpaired
samples. Journal of the American Statistical Association, 55, 429a€"445.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Ansari, A. R. & Bradley, R. A . (1960). Rank-sum tests for dispersions. Annals of Mathematical Statistics, 31,
1174a€"1189.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Conover, W. J. & Iman, R. L. (1978). Some exact tables for the squared ranks test. Communication in Statistics:
Simulation and Computation, B7 (5), 491a€"513.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Freund, J. E. & Ansari, A. R. (1957). Two-way rank-sum test for variances. Technical Report Number 34,
Virginia Polytechnic Institute and State University, Blacksburg, V A .
Hollander, M . (1963). A nonparametric test for the two-sample problem. Psychometrika, 28, 395a€"403.
Hollander, M . & Wolfe, D. A . (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Klotz, J. (1962). Nonparametric tests for scale. Annals of Mathematical Statistics, 33, 498a€"512.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Mood, A. M . (1954). On the asymptotic efficiency o f certain nonparametric two-sample tests. Annals of
Mathematical Statistics, 25, 514a€"522.
Moses, L. E. (1952). A two-sample test. Psychometrika, 17, 234a€"247.
Moses, L. E. (1963). Rank tests of dispersion. Annals of Mathematical Statistics, 34, 973a€"983.
Sheskin, D. J. (1984). Statistical tests and experimental design. New York: Gardner Press.
Shorack, G. R. (1969). Testing and estimating ratios of scale parameters. Journal of the American Statistical
Association, 64, 999a€"1013.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGrawa€"Hill Book Company.
Siegel, S. & Tukey, J. W. (1960). A nonparametric sum o f ranks procedure for relative spread in unpaired
samples. Journal of the American Statistical Association, 55, 429a€"445.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Agresti, A (1984). Analysis of ordinal categorical data. New York: John Wiley & Sons.
Agresti, A (1990). Categorical data analysis. New York: John Wiley & Sons.
Atherton Skaff, P. J. & Sloan, J. A . (2004). Design and analysis of equivalence clinical trials via the SAS system.
Website: http://www2.sas.com/proceedings/sugi23/Stats/p218.pdf.
Beyer, W. H . (Ed.) (1968). CRC handbook of tables for probability and statistics (2nd ed.). Boca Raton, FL: CRC
Press.
Bresnahan, J. L. & Shapiro, M . M . (1966). A general equation and technique for the exact partitioning o f chi-
square contingency tables. Psychological Bulletin, 66, 252a€"262.
Carroll, J. B. (1961). The nature o f data, or how to choose a correlation coefficient. Psychometrika, 26,
347a€"372.
Castellan, N . J., Jr. (1965). On the partitioning of contingency tables. Psychological Bulletin, 64, 330a€"338.
Chen, J. J., Tsong, Y., & Kang, S. H . (2000). Tests for equivalence or noninferiority between two proportions.
Drug Information Journal, 34, 569a€"578.
Christensen, R. (1990). Log-linear models. New York: Springer-Verlag.
Christensen, R. (1999). Log-linear models (2nd ed.). New York: Springer-Verlag.
Cochran, W. G. (1952). The chi-square goodness-of-fit test. Annals of Mathematical Statistics, 23, 315a€"345.
Cochran, W. G. (1954). Some methods for strengthening the common chi-square tests. Biometrics, 10, 417a€"451.
Cohen, J. (1960). A coefficient o f agreement for nominal scales. Educational and Psychological Measurement, 10,
37a€"46.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Cornfield, J. (1951). A method of estimating comparative rates from clinical data. Applications to cancer of the
lung, breast and cervix. Journal of the National Cancer Institute, 11, 1229a€"1275.
CramA©r, H. (1946). Mathematical models of statistics. Princeton, NJ: Princeton University Press.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWSa€"Kent Publishing Company.
Everitt, B. S. (1977). The analysis of contingency tables. New York: Chapman & Hall.
Everitt, B. S. (1992). The analysis of contingency tables (2nd ed.). New York: Chapman & Hall.
Everitt, B. S. (2001). Statistics for psychologists: An intermediate course. Mahwah, NJ: Lawrence Erlbaum
Associates.
Feinstein, A.R. (2002). Principles of medical statistics. Boca Raton, FL: Chapman & Hall/CRC.
Field, A . (2005). Discovering statistics using SPSS (2nd ed.). London: Sage Publications.
Fienberg, S. E. (1980). The analysis of cross-classified categorical data (2nd ed.). Cambridge, MA: M I T Press.
Fisher, R. A. (1934). Statistical methods for research workers (5th ed.). Edinburgh: Oliver & Boyd.
Fisher, R. A. (1935). The logic o f inductive inference. Journal of the Royal Statistical Society, Series A, 98,
39a€"54.
Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: John Wiley & Sons.
Fleiss, J. L . , Levin, B. & Paik, M . C. (2003). Statistical methods for rates and proportions (3rd ed.). New York:
John Wiley & Sons.
Garson, D. G. (2006). Statistics: Topics in multivariate analysis. Website:
http://www2.chas.ncsu.edu/garson/pa765/statnote.htm.
Goodman, L. A . (1970). The multivariate analysis o f qualitative data. Journal of the American Statistical
Association, 65, 226a€"256.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Guilford, J. P. (1965). Fundamental statistics in psychology and education (4th ed.). New York: McGraw-Hill
Book Company.
Haber, M . (1980). A comparison of some continuity corrections for the chi-squared test on 2 A — 2 tables. Journal
of the American Statistical Association, 75, 510a€"515.
Haber, M . (1982). The continuity correction and statistical testing. International Statistical Review, 50,
135a€"144.
Haberman, S. J. (1973). The analysis of residuals in cross-classified tables. Biometrics, 29, 205a€"220.
Haberman, S. J. (1974). The analysis of frequency data. Chicago: University of Chicago Press.
Hollander, M . & Wolfe, D. A. (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Howell, D. C. (1992). Statistical methods for psychology (3rd ed.). Boston: PWS-Kent Publishing Company.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury.
Irwin, J. O. (1935). Tests of significance for differences between percentages based on small numbers. Metron, 12,
83a€"94.
Kennedy, J. J. (1983). Analyzing quantitative data: Introductory loglinear analysis for behavioral research. New
York: Prager.
Keppel, G. & Saufley, W. H . , Jr. (1980). Introduction to design and analysis: A studentd€™s handbook. San
Francisco: W. H. Freeman & Company.
Keppel, G., Saufley, W. H. Jr., & Tokunaga, H. (1992). Introduction to design and analysis: A studentd€™s
handbook (2nd ed.). New York: W. H. Freeman & Company.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, DC: American Psychological Association.
Mantel, N . & Haenszel, W. (1959). Statistical aspects o f the analysis o f data from retrospective studies o f disease.
Journal of the National Cancer Institute, 22, 719a€"748.
Marascuilo, L. A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social sciences.
Belmont, CA: Brooks/Cole Publishing Company.
Marascuilo, L . A . & Serlin, R. C. (1988). Statistical methods for the social and behavioral sciences. New York:
W.H. Freeman & Company.
NoruAjis, N . J. (2004). SPSS 13.0 advanced statistical procedures companion. Upper Saddle River, NJ: Prentice
Hall.
Ott, R. L . , Larson, R., Rexroat, C., & Mendenhall, W. (1992). Statistics: A tool for the social sciences (5th ed.).
Boston: PWSa€"Kent Publishing Company.
Owen, D. B. (1962). Handbook of statistical tables. Reading, MA: Addisona€"Wesley.
Pagano, M . & Gauvreau, K. (1993). Principles of biostatistics. Belmont, CA: Duxbury Press.
Reynolds, H. T. (1977a). The analysis of cross-classifications. New York: The Free Press.
Reynolds, H. T. (1977b). Analysis of nominal data. Beverly Hills, CA: Sage Publications.
Rodgers, J. L . , Howard, K. I . , & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between
two experimental groups. Psychological Bulletin, 113, 553a€"565.
Rosner, B. (1995). Fundamentals of biostatistics (4th ed.). Belmont, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Schuirmann, D. J. (1987). A comparison o f the two one-sided tests procedure and the power approach for
assessing equivalence o f average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15,
657a€"680.
Selvin, S. (1995). Practical biostatistical methods. Belmont, CA: Duxbury.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York:
McGraw-Hill Book Company.
Smithson, M . (2003). Confidence intervals. Thousand Oaks, CA: Sage Publications.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Tabachnick, B. G. & Fidell, L . S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
van Belle, G. (2002). Statistical rules of thumb. New York: John Wiley & Sons.
Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 32, 741a€"744.
Westlake, W. J. (1981). Response to T.B. L . Kirkwood: Bioequivalence testing a€" a need to rethink. Biometrics,
37, 589a€"594.
Westlake, W. J. (1988). Bioavailability and bioequivalence of pharmaceutical formulations. In K. E. Peace (Ed.),
Biopharamaceutical statistics for drug development (pp. 329a€"352). New York: Marcel Decker.
Wickens, T. (1989). Multiway contingency table analysis for the social sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Wilks, S. S. (1935). The likelihood test of independence in a contingency table. Annals of Mathematical Statistics,
6, 190a€"196.
Williams, K. (1976). The failure of Pearsona€™s goodness of fit statistic. Statistician, 25, 49.
Yates, F. (1934). Contingency tables involving small numbers and the chi-square test. Journal of the Royal
Statistical Society, 1, 217a€"235.
Yule, G. (1900). On the association o f the attributes in statistics: With illustrations from the material o f the
childhood society, & c. Philosophical Transactions of the Royal Society, Series A , 194, 257a€"319.
Zar, J. H. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Anderson, N . N . (2001). Empirical direction in design and analysis. Mahwah, NJ: Lawrence Erlbaum Associates.
Cohen, B. H. (2001). Explaining psychological statistics (2nd ed.). New York: John Wiley & Sons.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Fisher, R. A. (1935). The design of experiments (7th ed.). Edinburgha€"London: Oliver & Boyd.
Huck, S. W. & McLean, R. A . (1975). Using a repeated measures A N O V A to analyze the data from a pretest-
posttest design: A potentially confusing task. Psychological Bulletin, 82, 511a€"518.
Keppel, G. (1991). Design and analysis: A researcherd€™s handbook (3rd ed.). Englewood Cliffs, NJ: Prentice
Hall.
Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd. ed.). Belmont, CA:
Brooks/Cole Publishing Company.
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences (3rd ed.). Pacific Grove, CA:
Brooks/Cole Publishing Company.
Rodgers, J. L . , Howard, K. I . , & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between
two experimental groups. Psychological Bulletin, 113, 553a€"565.
Rosner, B. (2000). Fundamental of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Sandler, J. (1955). A test o f the significance o f difference between the means o f correlated measures based on a
simplification of Studenta€™s t. British Journal of Psychology, 46, 225a€"226.
Schuirmann, D. J. (1987). A comparison o f the two one-sided tests procedure and the power approach for
assessing equivalence o f average bioavailability. Journal of Pharma-cokinetics and Biopharmaceutics, 15,
657a€"680.
Tryon, W. W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential
confidence intervals: A n integrated alternative method o f conducting null hypothesis statistical tests.
Psychological Methods, 6, 371a€"386.
van Belle, G. (2002). Statistical rules of thumb. New York: John Wiley & Sons.
Weinfurt, K.P. (2000). Repeated measures analysis: A N O V A , M A N O V A and H L M . In Grimm, L. G. & Yarnold,
P. R. (Eds.), Reading and understanding more multivariate statistics (pp. 317a€"361). Washington, DC:
American Psychological Association.
Wellek, S. (2003). Testing statistical hypotheses of equivalence. Boca Raton, FL: Chapman & Hall/CRC.
Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 32, 741a€"744.
Westlake, W. J. (1981). Response to T.B. L . Kirkwood: Bioequivalence testing a€" a need to rethink. Biometrics,
37, 589a€"594.
Westlake, W. J. (1988). Bioavailability and bioequivalence of pharmaceutical formulations. In K. E. Peace (Ed.),
Biopharamaceutical statistics for drug development (pp. 329a€"352). New York: Marcel Dekker.
Zar, J. H. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Bell, C. B. & Doksum, K. A . (1965). Some new distribution-free statistics. Annals of Mathematical Statistics, 36,
203a€"214.
Conover, W. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Fisher, R. A. (1935). The design of experiments (7th ed.). Edinburgh-London: Oliver & Boyd.
Glass, G. V. & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.) Boston: Allyn &
Bacon.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Higgins, J. J. (2004). An introduction to modern nonparametric statistics. Belmont, CA: Duxbury Press.
Hollander, M . & Wolfe, D. A . (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Marascuilo, L . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social sciences.
Monterey, CA: Brooks/Cole Publishing Company.
Sheskin, D. J. (1984). Statistical tests and experimental design: A guidebook. New York: Gardner Press.
Siegel, S. & Castellan, N . , Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York:
McGraw-Hill Book Company.
van der Waerden, B. L. (1952/1953). Order tests for the two-sample problem and their power. Proceedings
Koninklijke Nederlandse Akademie van Wetenshappen (A), 55 (Indagationes Mathematicae 14), 453d€"158,
and 56 (Indagationes Mathematicae, 15), 303a€"316 (corrections appear in Vol. 56, p. 80).
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80a€"83.
Wilcoxon, F. (1949). Some rapid approximate statistical procedures. Stamford, CT: Stamford Research
Laboratories, American Cyanamid Corporation.
Zar, J. H. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Marascuilo, L. A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social sciences.
Monterey, CA: Brooks/Cole Publishing Company.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York:
McGrawa€"Hill Book Company.
Bowker, A. H. (1948). A test for symmetry in contingency tables. Journal of the American Statistical Association,
43, 572a€"574.
Chow, S. C. & L i u , J. P. (2004). Design and analysis of clinical trials: Concepts and methodologies (2nd ed.).
Hoboken, NJ: John Wiley & Sons.
Connett, J. E., Smith, J. A., & McHugh, R. B. (1987). Sample size and power for pair-matched case-control
studies. Statistics in Medicine, 6, 53a€"59.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Dunnet, C. W. & Gent, M . (1977). Significance testing to establish equivalence between treatments with special
reference to data in the form of 2 A — 2 tables. Biometrics, 33, 593a€"602.
Edwards, A. L . (1948). Note on the a€oecorrection for continuitya€' in testing the significance of the difference
between correlated proportions. Psychometrika, 13, 185a€"187.
Everitt, B. S. (1977). The analysis of contingency tables. London: Chapman & Hall.
Everitt, B. S. (1992). The analysis of contingency tables (2nd ed.). New York: Chapman & Hall.
Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: John Wiley & Sons.
Fleiss, J. L. & Everitt, B. S. (1971). Comparing the marginal totals o f square contingency tables. British Journal of
Math. Statist. Psychol., 24, 117a€"123.
Fleiss, J. L . , Levin, B. & Paik, M . C. (2003). Statistical methods for rates and proportions (3rd ed.). New York:
John Wiley & Sons.
Gart, J. J. (1969). A n exact test for comparing matched proportions in crossover designs. Biometrika, 56, 75a€"80.
Hinkle, D. E., Wiersma, W., & Jurs, S.G. (1998). Applied statistics for the behavioral sciences) (4th ed.). Boston:
Houghton Mifflin Company.
Leach, C. (1979). Introduction to statistics: A nonparametric approach for the social sciences. Chichester,
England: John Wiley & Sons.
Liu, J. P., Hsueh, H . M . , Hiseh, E., & Chen, J. J. (2002). Tests for equivalence or non-inferiority for paired binary
data. Statistics in Medicine, 21, 231a€"245.
Liu, K. J. & Cumberland, W. G. (2001). A test procedure o f equivalence in ordinal data with matched-pairs.
Biometrical Journal, 43, 977a€"983.
Lu, Y. & Bean, J. A. (1995). On the sample size for one-sided equivalence o f senitivities based upon
McNemara€™s test. Statistics in Medicine, 14, 1831a€"1839.
Marascuilo, L. A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social sciences.
Monterey, CA: Brooks/Cole Publishing Company.
Marascuilo, L . A . & Serlin, R. C. (1988). Statistical methods for the social and behavioral sciences. New York:
W. H . Freeman & Company.
Maxwell, A . E. (1970). Comparing the classification of subjects by two independent judges. British Journal of
Psychiatry, 116, 651a€"655.
McNemar, Q. (1947). Note on the sampling error o f the difference between correlated proportions or percentages.
Psychometrika, 12, 153a€"157.
Morikawa, T. & Yanagawa, T. (1995). Equivalence testing for paired dichotomous data. Proceedings of Annual
Conference of Biometric Society of Japan, 123a€"126.
Nam, J. (1997). Establishing equivalence o f two treatments and sample size requirements in matched-pairs
designs. Biometrics, 53, 1422a€"1430.
Rodgers, J. L . , Howard, K. I . , & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between
two experimental groups. Psychological Bulletin, 113, 553a€"565.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Schuirmann, D. J. (1987). A comparison o f the two one-sided tests procedure and the power approach for
assessing equivalence o f average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics, 15,
657a€"680.
Selvin, S. (1995). Practical biostatistical methods. Belmont, CA: Duxbury.
Selvin, S. (2004). Statistical analysis of epidemiological data (3rd ed.). Oxford: Oxford University Press.
Selwyn, M . R., Dempster, A . P., & Hall, N . R. (1981). A Bayesian approach to bioequivalence for the 2 A — 2
changeover design. Biometrics, 37, 11a€"21.
Selwyn, M . R. & Hall, N . R. (1984). On Baysian methods for bioequivalence. Biometrics, 40, 1103a€"1108.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Stuart, A. A . (1955). A test for homogeneity o f the marginal distributions in a two way classification. Biometrika,
42, 412a€"416.
Stuart, A . A . (1957). The comparison o f frequencies in matched samples. British Journal of Statistical
Psychology, 10, 29a€"32.
Tango, T. (1998). Equivalence test and confidence interval for the difference in proportions for the paired-sample
design. Statistics in Medicine, 17, 891a€"908.
Tango, T. (1999). Improved confidence intervals for the difference between binomial proportions based on paired
data. Statistics in Medicine, 18, 3511a€"3513.
Wellek, S. (2003). Testing statistical hypotheses of equivalence. Boca Raton, FL: Chapman & Hall/CRC.
Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 32, 741a€"744.
Westlake, W. J. (1981). Response to T.B. L. Kirkwood: Bioequivalence testing a€" a need to rethink. Biometrics,
37, 589a€"594.
Westlake, W. J. (1988). Bioavailability and bioequivalence of pharmaceutical formulations. In K. E. Peace (Ed.),
Biopharamaceutical statistics for drug development (pp. 329a€"352). New York: Marcel Dekker.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Anderson, N . N . (2001). Empirical direction in design and analysis. Mahwah, NJ: Lawrence Erlbaum Associates.
Bartlett, M . S. (1937). Some examples o f statistical methods o f research in agriculture and applied biology.
Journal of the Royal Statistical Society Supplement, 4, 137a€"170.
Beyer, W. H. (1968). Handbook of tables for probability and statistics (2nd ed.). Cleveland, OH: CRC Press.
Box, G. E. (1953). Non-normality and tests on variance. Biometrika, 40, 318a€"335.
Brown, M . B. & Forsythe, A. B. (1974a). Robust tests for equality o f variances. Journal of the American
Statistical Association, 69, 364a€"367.
Brown, M . B. & Forsythe, A. B. (1974b). The small sample behavior o f some statistics which test the equality o f
several means. Technometrics, 16, 129a€"132.
Brown, M . B. & Forsythe, A. B. (1974c). The A N O V A and multiple comparisons for data with heterogenous
variances. Biometrics, 30, 719a€"724.
Cochran, W. G. (1941). The distribution o f the largest o f a set o f estimated variances as a fraction o f their total.
Annals of Eugenics, 11, 47a€"52.
Cochran, W. G. & Cox, G. M . (1957). Experimental designs (2nd ed.). New York: John Wiley & Sons.
Cohen, B. H. (2001). Explaining psychological statistics (2nd ed.). New York: John Wiley & Sons.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Cohen, J. & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd
ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Conover, W. J., Johnson, M . E., & Johnson, M . M . (1981). A comparative study of tests of homogeneity of
variance with applications to the outer continental shelf bidding data. Technometrics, 23, 351a€"361.
Cortina, J. M . & Nouri, H . (2000). Effect sizes for ANOVA designs. Thousand Oaks, CA: Sage Publications.
Darlington, R. B. & Carlson, P. M . (1987). Behavioral statistics: Logic and methods. New York: The Free Press.
Dayton, C. M . (2003). Information criteria for pairwise comparisons. Psychological Methods, 8, 61a€"71.
Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56,
52a€"64.
Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a control. Journal
of the American Statistical Association, 50, 1096a€"1121.
Dunnett, C. W. (1964). New tables for multiple comparisons with a control. Biometrics, 20, 482a€"491.
Field, A. (2005). Discovering statistics using SPSS (2nd ed.). Sage Publications: London.
Fisher, R. A. (1932). Statistical methods for research workers. Edinburgh: Oliver & Boyd.
Fisher, R. A. (1935). The design of experiments. Edinburgh & London: Oliver & Boyd.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Hartley, H. O. (1940). Testing the homogeneity of a set of variances. Biometrika, 31, 249a€"255.
Hartley, H. O. (1950). The maximum F ratio as a shortcut test for heterogeneity o f variance. Biometrika, 37,
308a€"312.
Hayter, A. J. (1986). The maximum familywise error rate of Fishera€™s least significant difference test. Journal
of the American Statistical Association, 81, 1000a€"1004.
Hinkle, D. E., Wiersma, W., & Jurs, S. G. (1998). Applied statistics for the behavioral sciences. (4th ed.). Boston:
Houghton Mifflin Company.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandanavian Journal of Statistics, 6,
65a€"70.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury Press.
Hsu, J. C. (1996). Multiple comparisons: Theory and methods. New York: Chapman & Hall.
Huitema, B. (1980). The analysis of covariance and alternatives. New York: John Wiley & Sons.
Hunter, J. E. & Schmidt, F. L . (1990). Methods of meta-analysis: Correcting error and bias in research findings
(1st ed.). Newbury Park, CA: Sage Publications.
Hunter, J. E. & Schmidt, F. L . (2004). Methods of meta-analysis: Correcting error and bias in research findings
(2nd ed.). Thousand Oaks, CA: Sage Publications.
James, G. S. (1951). The comparison o f several groups o f observations when the ratios o f the population variances
are unknown. Biometrika, 38, 324a€"329.
Kachigan, S. K. (1986). Statistical analysis: An interdisciplinary introduction to univariate and multivariate
methods. New York: Radius Press.
Keppel, G. (1991). Design and analysis: A researcherd€™s handbook (3rd ed.). Englewood Cliffs, NJ: Prentice
Hall.
Keppel, G., Saufley, W. H . , & Tokunaga, H. (1992). Introduction to design and analysis: A studentd€™s
handbook (2nd ed.). New York: W. H. Freeman & Company.
Keppel, G. & Wickens, T. D. (2004). Design and analysis: A researcherd€™s handbook (4th ed.). Upper Saddle
River, NJ: Pearson/Prentice Hall.
Keppel, G. & Zedeck, S. (1989). Data analysis for research designs. New York: W. H. Freeman & Company.
Keuls, M . (1952). The use of studentized range in connection with an analysis of variance. Euphytica, 1,
112a€"122.
Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Belmont, CA:
Brooks/Cole Publishing Company.
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences (3rd ed.). Pacific Grove, CA:
Brooks/Cole Publishing Company.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, DC: American Psychological Association.
Levene, H. (1960). Robust tests for the equality o f variance. I n I . Olkin (Ed.) Contributions to probability and
statistics (pp. 278a€"292). Palo Alto, CA: Stanford University Press.
Licht, M . H. (1995). Multiple regression and correlation. In Grimm, L . G. & Yamold, P. R. (Eds.), Reading and
understanding multivariate statistics, (pp. 100a€"136). Washington, DC: American Psychological Association.
Lord, F. M . (1967). A paradox in the interpretation of group comparisons. Psychological Bulletin, 68, 304a€"305.
Lord, F. M . (1969). Statistical adjustments when comparing pre-existing groups. Psychological Bulletin, 72,
336a€"337.
Marascuilo, L . A. & Levin, J. R. (1983). Multivariate statistics in the social sciences. Monterey, CA: Brooks/Cole
Publishing Company.
Marascuilo, L . A. & Serlin, R. C. (1988). Statistical methods for the social and behavioral sciences. New York:
W. H. Freeman & Company.
Maxwell, S. E. & Delaney, H. D. (1990). Designing experiments and analyzing data. Belmont, CA: Wadsworth
Publishing Company.
Maxwell, S. E. & Delaney, H. D. (2000). Designing experiments and analyzing data. Mahwah, NJ: Lawrence
Erlbaum Associates.
Maxwell, S. E. & Delaney, H . D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
McFatter, R. M . & Gollob, H. F. (1986). The power o f hypothesis tests for comparisons. Educational and
Psychological Measurement, 46, 883a€"886.
Myers, J. L. & Well, A. D. (1995). Research design and statistical analysis. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Myers, J. L . & Well, A. D. (2003). Research design and statistical analysis (2nd ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Newman, D. (1939). The distribution o f the range in samples from a normal population, expressed in terms o f an
independent estimate of standard deviation. Biometrika, 31, 20a€"30.
Oa€™Brien, R. G. (1981). A simple test for variance effects in experimental designs. Psychological Bulletin, 89,
570a€"574.
Pearson, E. S. & Hartley, H. O. (1951). Charts o f the power function for analysis o f variance, derived from the
non-central F distribution. Biometrika, 38, 112a€"130.
Rosenthal, R., Rosnow, R. L . , & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A
correlational approach. Cambridge, UK: Cambridge University Press.
Rosnow, R. L . , Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation.
Psychological Science, 11, 446a€"453.
Ryan, T. A. (1960). Significance tests for multiple comparisons o f proportions, variances, and other statistics.
Psychological Bulletin, 57, 318a€"328.
ScheffA©, H. A. (1953). A method for judging all possible contrasts in the analysis of variance. Biometrika, 40,
87a€"104.
ScheffA©, H. A. (1959). The analysis of variance. New York: John Wiley & Sons.
A idAjk Z. (1967). Rectangular confidence regions for the means o f multivariate normal distributions. Journal of
the American Statistical Association, 62, 626a€"633.
Smithson, M . (2003). Confidence intervals. Thousand Oaks, CA: Sage Publications.
Stevens, J. (1996). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L. S. (1989). Using multivariate statistics (2nd ed.). New York: Harper Collins.
Tabachnick, B. G. & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins.
Tatsuoka, M . M . (1975). The general linear model: A new trend in analysis of variance. Champaign, IL: Institute
for Personality & Ability Testing.
Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for
effect sizes. Educational Researcher, 31, 25a€"32.
Tiku, M . L. (1967). Tables o f the power o f the F test. Journal of the American Statistical Association, 62,
525a€"539.
Toothaker, L. E. (1991). Multiple comparisons for researchers. Newbury Park, CA: Sage.
Tukey, J. W. (1953). The problem of multiple comparisons. Unpublished paper, Princeton University, Princeton,
NJ.
Weinfurt, K.P. (2000). Repeated measures analysis: A N O V A , M A N O V A and H L M . In Grimm, L. G. & Yamold,
P. R. (Eds.). Reading and understanding more multivariate statistics (pp. 317a€"361). Washington, DC:
American Psychological Association.
Welch, B. L . (1951). On the comparison o f several mean values: A n alternative approach. Biometrika, 38,
330a€"336.
Wilcox, R. R., (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press.
Winer, B. J., Brown, D. R., & Michels, K. M . (1991). Statistical principles in experimental design (3rd ed.). New
York: McGraw-Hill Publishing Company.
Bell, C. B. and Doksum, K. A . (1965). Some new distribution-free statistics. Annals of Mathematical Statistics,
36, 203a€"214.
Beyer, W. H. (1968). Handbook of tables for probability and statistics (2nd ed.). Cleveland, OH: The Chemical
Rubber Company.
Bradley, J. V. (1968). Distribution-free statistical tests. Englewood Cliffs, NJ: Prentice Hall.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Delaney, H. D. & Vargha, A . (2002). Comparing several robust tests o f stochastic equality with ordinally scaled
variables and small to moderate sized samples. Psychological Methods, 7, 485a€"503.
Dunn, O. J. (1964). Multiple comparisons using rank sums. Technometrics, 6, 241a€"252.
Gibbons, J. D. (1997). Nonparametric methods for quantitative analysis (3rd ed.). Columbus, OH: American
Sciences Press.
Hollander, M . & Wolfe, D. A . (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Jonckheere, A. R. (1954). A distribution-free k sample test against ordered alternatives. Biometrika, 41,
133a€"145.
Kolmogorov, A. N . (1933). Sulla determinazione empiraca di una legge di distribuzione. Giorn delld€™Inst. Ital.
degli. Att., 4, 89a€"91.
Kruskal, W. H. (1952). A nonparametric test for the several sample problem. Annals of Mathematical Statistics,
23, 525a€"540.
Kruskal, W. H. & Wallis, W. A. (1952). Use o f ranks in one-criterion variance analysis. Journal of the American
Statistical Association, 47, 583a€"621.
Leach, C. (1979). Introduction to statistics: A nonparametric approach for the social sciences. Chichester: John
Wiley & Sons.
Lehman, E. L . (1975). Nonparametric statistical methods based on ranks. San Francisco: Holden-Day.
Marascuilo, L. A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social sciences.
Monterey, CA: Brooks/Cole Publishing Company.
Marascuilo, L . A . & Serlin, R. C. (1988). Statistical methods for the social and behavioral sciences. New York:
W. H. Freeman and Company.
Maxwell, S. & Delaney, H. (1990). Designing experiments and analyzing data. Belmont, CA: Wadsworth
Publishing Company.
Maxwell, S. E. & Delaney, H . D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Odeh, R. E. (1971). Jonckheerea€™s k-sample test against ordered alternatives. Technometrics, 13, 912a€"918.
Sheskin, D. J. (1984). Statistical tests and experimental design: A guidebook. New York: Gardner Press.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York:
McGraw-Hill Book Company.
Smirnov, N . V. (1939). On the estimation of the discrepancy between empirical curves of distributions for two
independent samples. Bulletin University of Moscow, 2, 3a€"14.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Terpstra, T. J. (1952). The asymptotic normality and consistency of Kendalla€™s test against trend, when ties are
present in one ranking. Indagationes Mathematicae, 14, 327a€"333.
Terry, M . E. (1952). Some rank-order tests, which are most powerful against specific parametric alternatives.
Annals of Mathematical Statistics, 23, 346a€"366.
van der Waerden, B. L. (1952/1953). Order tests for the two-sample problem and their power. Proceedings
Koninklijke Nederlandse Akademie van Wetenshappen (A), 55 (Indagationes Mathematicae 14), 453d€"458, &
56 (Indagationes Mathematicae, 15), 303a€"316 (corrections appear in Vol. 56, p. 80).
Wike, E. L. (1978). A Monte Carlo investigation o f four nonparametric multiple-comparison tests for k
independent samples. Bulletin of the Psychonomic Society, 11, 25a€"28.
Wike, E. L . (1985). Numbers: A primer of data analysis. Columbus, OH: Charles E. Merrill Publishing Company.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Zimmerman, D. W. & Zumbo, B. D. (1993). The relative power o f parametric & nonparametric statistical
methods. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences:
Methodological issues (pp. 481a€"517). Hillsdale, NJ: Lawrence Erlbaum Associates.
Bell, C. B. & Doksum, K. A . (1965). Some new distribution-free statistics. Annals of Mathematical Statistics, 36,
203a€"214.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Marascuilo, L. A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social sciences.
Monterey, CA: Brooks/Cole Publishing Company.
Sheskin, D. J. (1984). Statistical tests and experimental design: A guidebook. New York: Gardner Press.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York:
McGraw-Hill Book Company.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall
Terry, M . E. (1952). Some rank-order tests, which are most powerful against specific parametric alternatives.
Annals of Mathematical Statistics, 23, 346a€"366.
van der Waerden, B. L. (1952/1953). Order tests for the two-sample problem and their power. Proceedings
Koninklijke Nederlandse Akademie van Wetenshappen (A), 55 (Indagationes Mathematicae 14), 453d€"458, &
56 (Indagationes Mathematicae, 15), 303a€"316 (corrections appear in Vol. 56, p. 80).
Anderson, N . N . (2001). Empirical direction in design and analysis. Mahwah, NJ: Lawrence Erlbaum
Associates.
Bartko, J. J. (1976). On various intraclass correlation reliability coefficients. Psychological Bulletin, 83,
762a€"765.
Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study o f analysis o f variance problems, I I .
Effect of inequality of variances and correlation between error in two-way classification. Annals of
Mathematical Statistics, 25, 484a€"498.
Cohen, B. H. (2001). Explaining psychological statistics (2nd ed.). New York: John Wiley & Sons.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Cortina, J. M . & Nouri, H . (2000). Effect sizes for ANOVA designs. Thousand Oaks, CA: Sage Publications.
Everitt, B. S. (2001). Statistics for psychologists: A n intermediate course. Mahwah, NJ: Lawrence Erlbaum
Associates.
Geisser, S. & Greenhouse, S. W. (1958). A n extension of Boxa€™s results to the use of the F distribution in
multivariate analysis. Annals of Mathematical Statistics, 29, 885a€"891.
Greenhouse, S. W. & Geisser, S. (1959). On the methods in the analysis of profile data. Psychometrika, 24,
95a€"112.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ:
Lawrence Erlbaum Associates.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury Press.
Hsu, J. C. (1996). Multiple comparisons: Theory and methods. New York: Chapman & Hall.
Huynh, H . & Feldt, L. S. (1976). Estimates o f the correction for degrees o f freedom for sample data in randomized
block and split-plot designs. Journal of Educational Statistics, 1, 69a€"82.
Keppel, G. (1991). Design and analysis: A researchera€™s handbook (3rd ed.). Englewood Cliffs, NJ: Prentice
Hall.
Keppel, G. & Wickens, T. D. (2004). Design and analysis: A researchera€™s handbook (4th ed.). Upper
Saddle River, NJ: Pearson/Prentice Hall.
Keppel, G. & Zedeck, S. (1989). Data analysis for research designs. New York: W. H . Freeman & Company.
Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Belmont, CA:
Brooks/Cole Publishing Company.
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences (3rd ed.). Pacific Grove, CA:
Brooks/Cole Publishing Company.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, DC: American Psychological Association.
McFatter, R. M . & Gollob, H . F. (1986). The power o f hypothesis tests for comparisons. Educational and
Psychological Measurement, 46, 883a€"886.
Mauchly, J. W. (1940). Significance test for sphericity o f a normal n-variate distribution. Annals of
Mathematical Statistics, 11, 204a€"209.
Maxwell, S. E. & Delaney, H . D. (1990). Designing experiments and analyzing data. Mahwah, NJ: Lawrence
Erlbaum Associates.
Maxwell, S. E. & Delaney, H . D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
McCulloch, C. E. (2005). Repeated measures A N O V A , RIP? Chance, 18, 29a€"33.
McFatter, R. M . & Gollob, H . F. (1986). The power o f hypothesis tests for comparisons. Educational and
Psychological Measurement, 46, 883a€"886.
Myers, J. L. & Well, A . D. (1995). Research design and statistical analysis. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Myers, J. L. & Well, A. D. (2003). Research design and statistical analysis (2nd ed.) Mahwah, NJ: Lawrence
Erlbaum Associates.
Rosenthal, R., Rosnow, R. L . , & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A
correlational approach. Cambridge, UK: Cambridge University Press.
Rosnow, R. L . , Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation.
Psychological Science, 11, 446a€"453.
Shrout, P. E. & Fleiss, J. L. (1979). Intraclass correlations: Use in assessing rater reliability. Psychological
Bulletin, 86, 420a€"428.
Stevens, J.P. (2002). Applied multivariate statistics for the social scieences (4th ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. C. & Fidell, L . S. (1989). Using multivariate statistics (2nd ed.). New York: HarperCollins
Publishers.
Weinfurt, K.P. (2000). Repeated measures analysis: A N O V A , M A N O V A and H L M . Grimm, L . G. & Yarnold, P.
R. (Eds.), Reading and understanding more multivariate statistics (pp. 317a€"361). Washington, DC:
American Psychological Association.
Wilcox, R. R., (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press.
Winer, B. J., Brown, D. R., & Michels, K. M . (1991). Statistical principles in experimental design (3rd ed.).
New York: McGraw-Hill Publishing Company.
Bell, C. B. & Doksum, K. A. (1965). Some new distribution-free statistics. Annals of Mathematical Statistics,
36, 203a€"214.
Church, J. D. & Wike, E. L. (1979). A Monte Carlo study o f nonparametric multiple-comparison tests for a two-
way layout. Bulletin of the Psychonomic Society, 14, 95a€"98.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Friedman, M . (1937). The use o f ranks to avoid the assumption o f normality implicit in the analysis o f variance.
Journal of the American Statistical Association, 32, 675a€"701.
Hollander, M . & Wolfe, D. A . (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Iman, R. L. & Davenport, J. M . (1980). Approximations o f the critical region o f the Friedman statistic.
Communication in Statistics a€" Theory and Methods, 9, 571a€"595.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Marascuilo, L. A. & Serlin, R. C. (1988). Statistical methods for the social and behavioral sciences. New York:
W. H. Freedman & Company.
Noether, G. E. (1967). Elements of nonparametric statistics. New York: John Wiley & Sons.
Page, E. B. (1963). Ordered hypotheses for multiple treatments: A significance test for linear ranks. Journal of
the American Statistical Association, 58, 216a€"230.
Sheskin, D. J. (1984). Statistical tests and experimental design: A guidebook. New York: Gardner Press.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Sprent, P. & Smeeton, N . C. (2000). Applied nonparametric statistical methods (3rd ed.). London: Chapman &
Hall.
van der Waerden, B. L. (1952/1953). Order tests for the two-sample problem and their power. Proceedings
Koninklijke Nederlandse Akademie van Wetenshappen (A), 55 (Indagationes Mathematicae 14),
453a€"458, & 56 (Indagationes Mathematicae, 15), 303a€"316 (corrections appear in Vol. 56, p. 80).
Wike, E. L . (1985). Numbers: A primer of data. Columbus, OH: Charles E. Merrill Publishing Company.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Bennett, B. M . (1967). Tests o f hypotheses concerning matched samples. Journal of the Royal Statistical
Society, Ser. B., 29, 468a€"474.
2
Bennett, B. M . (1968). Notes on I J tests for matched samples. Journal of the Royal Statistical Society, Ser. B.,
30, 368a€"370.
Chou, Y. (1989). Statistical analysis for business and economics. New York: Elsevier.
Cochran, W. G. (1950). The comparison of percentages in matched samples. Biometrika, 37, 256a€"266.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. J. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: John Wiley & Sons.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Patil, K. D. (1975). Cochrana€™s Q test: Exact distribution. Journal of the American Statistical Association,
70, 186a€"189.
Shah, A . K. & Claypool, P. L . (1985). Analysis o f binary data in the randomized complete block design.
Communications in Statistics a€" Theory and Methods, 14, 1175a€"1179.
Sheskin, D. J. (1984). Statistical tests and experimental design: A guidebook. New York: Gardner Press.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Winer, B. J., Brown, D. R., & Michels, K. M . (1991). Statistical principles in experimental design (3rd ed.).
New York: McGraw-Hill, Inc.
Anderson, N . N . (2001). Empirical direction in design and analysis. Mahwah, NJ: Lawrence Erlbaum Associates.
Box, G. E. P., Hunter, W. G., & Hunter, J. S. (1978). Statistics for experimenters. New York: John Wiley & Sons.
Cohen, B. H. (2001). Explaining psychological statistics (2nd ed.). New York: John Wiley & Sons.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Cortina, J. M . & Nouri, H . (2000). Effect sizes for ANOVA designs. Thousand Oaks, CA: Sage Publications.
Edwards, A. L . (1985). Experimental design in psychological research (5th ed.). New York: Harper & Row.
Fleiss, J. L. (1986). The design and analysis of clinical experiments. New York: John Wiley & Sons.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Honeck, R. P., Kibler, C. T., & Sugar, J. (1983). Experimental design and analysis: A systematic approach.
Lanham, M D : University Press o f America.
Howell, D. C. (1992). Statistical methods for psychology (3rd ed.). Boston: PWS-Kent Publishing Company.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA.: Duxbury.
Hsu, J. C. (1996). Multiple comparisons: Theory and methods. New York: Chapman & Hall.
M
Keppel, G. (1973). Design and analysis: A researcherd€ s handbook. Englewood Cliffs, NJ: Prentice Hall.
M
Keppel, G. (1991). Design and analysis: A researcherd€ s handbook (3rd ed.). Englewood Cliffs, NJ: Prentice
Hall.
Keppel, G. & Wickens, T. D. (2004). Design and analysis: A researcherd€™s handbook (4th ed.). Upper Saddle
River, NJ: Pearson/Prentice Hall.
Keppel, G., Saufley, W. H . , & Tokunaga, H. (1992). Introduction to design and analysis: A studentd€™s
handbook (2nd ed.). New York: W. H. Freeman & Company.
Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Belmont, CA:
Brooks/Cole Publishing Company.
Kirk, R. (1995). Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole
Publishing Company.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, D.C: American Psychological Association.
Maxwell, S. E. & Delaney, H . D. (1990). Designing experiments and analyzing data. Belmont, CA: Wadsworth
Publishing Company.
Maxwell, S. E. & Delaney, H. D. (2000). Designing experiments and analyzing data. Mahwah, NJ: Lawrence
Erlbaum Associates.
Maxwell, S. E. & Delaney, H . D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Miller, I . & Freund, J. E. (1965). Probability and statistics for engineers. Englewood-Cliffs, NJ: Prentic-Hall.
Montgomery, D. C. (2000). Design and analysis of experiments (5th ed.). New York: John Wiley & Sons.
Myers, J. L. & Well, A. D. (1991). Research design and statistical analysis. New York: Harper Collins.
Myers, J. L. & Well, A. D. (1995). Research design and statistical analysis. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Myers, J. L . & Well, A. D. (2003). Research design and statistical analysis (2nd ed.) Mahwah, NJ: Lawrence
Erlbaum Associates.
Olejnik, S. & Algina, J. (2000). Measures o f effect size for comparative studies: Applications, interpretations, and
limitations. Contemporary Educational Psychology, 25, 241a€"286.
Plackett, R. L. & Burman, J. P. (1946). The design o f optimal multifactorial experiments. Biometrika, 33,
305a€"325.
Rosenthal, R., Rosnow, R. L . , & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A
correlational approach. Cambridge, UK: Cambridge University Press.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Rosnow, R. L . , Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation.
Psychological Science, 11, 446a€"153.
Tabachnick, B. G. & Fidell, L . S. (2001). Computer-assisted research design and analysis. Boston: Allyn &
Bacon.
Taguchi, G. (1986). Introduction to quality engineering. White Plains, NY: Asian Productivity Organization,
UNIPUB.
Taguchi, G. (1993). Taguchi methods: Design of experiments. Dearborn, M I : ASI Press.
Winer, B. J., Brown, D. R., & Michels, K. M . (1991). Statistical principles in experimental design (3rd ed.). New
York: McGraw-Hill Publishing Company.
Anderson, D. R., Sweeney, D. J. & Williams, T. A. (2002). Statistics for business and economics (8th ed.).
Cincinnati: South-Western/Thomson Learning.
Anderson, D. R., Sweeney, D. J. & Williams, T. A. (2011). Statistics for business and economics (11th ed.).
Mason, OH: South-Western/Thomson Learning.
Anderson, R. L. (1942). Distribution o f the serial correlation coefficient. Annals of Mathematical Statistics, 13,
1a€"13.
Beatty, M . J. (2002). Do we know a vector from a scalar? Why measures o f association (not their squares) are
appropriate indices of effect. Human Communication Research, 28, 605a€"611.
Bechtold, B. & Johnson, R. H. (1989). Statistics for business and economics. Boston: PWS-Kent Publishing
Company.
Bennett, C. A. & Franklin, N . L. (1954). Statistical analysis in chemistry and the chemical industry. New
York: John Wiley & Sons.
Berenson M . L . & Levine, D.M. (1996). Basic business statistics: Concepts and applications. Englewood Cliffs,
NJ: Prentice Hall.
Berk, R. A. & Rauma, D. (1983). Capitalizing on nonrandom assignment in treatment: A regression discontinuity
evaluation of a crime control program. Journal of the American Statistical Association, 78, 21a€"27.
Black, K. (2001). Business statistics: Contemporary decision making. Cincinnati, OH: Southwestem-College
Publishing.
Box, G. P. & Jenkins, G. M . (1970). Time series analysis: Forecasting and control. San Francisco: Holden-Day.
Box, G. P. & Pierce, D. A . (1970). Distribution of residual autocorrelations in autoregressive integrated moving
average time series models. Journal of the American Statistical Association, 65, 1509a€"1526.
Campbell, D. T. (1963). From description to experimentation: Interpreting trends as quasi-experiments. I n Harris,
C. W. (Ed.), Problems in measuring change (pp. 212a€"243). Madison, WI: University of Wisconsin Press.
Campbell, D. T. & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Chicago:
Rand-McNally.
Canavos, G. C. & Miller, D. M . (1995). Modern buisiness statistics. Belmont, CA: Duxbury.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245a€"266.
Chou, Y. (1989). Statistical analysis for business and economics. New York: Elsevier.
Cochran, W. G. (1954). The combination of estimates from different experiments. Biometrics, 10, 101a€"129.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates, Publishers.
Cohen, J. & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
Cohen, R. J., Swerdlick, M . E., & Smith, D. K. (1992). Psychological testing and assessment. Mountain View,
CA: Mayfield Publishing Company.
Cook, T. D. & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings.
Boston: Houghton Mifflin Company.
Cook, R. D. & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.
Daniel, W. W. and Terrell, J. C. (1995). Business statistics for management and economics (7th ed.). Boston:
Houghton Mifflin.
David, F. N . (1938). Tables of the ordinates and probability integral of the distribution of the correlation
coefficient in small samples. Cambridge: University Press.
Doane, D. P. & Seward, L. E. (2011). Applied statistics in business and economics (3rd ed.). New York:
McGraw-Hill/Irwin.
Durbin, J. & Watson, G. S. (1950). Testing for serial correlation in least squares regression I . Biometrika, 37,
409a€"438.
Durbin, J. & Watson, G. S. (1951). Testing for serial correlation in least squares regression I I . Biometrika, 38,
159a€"178.
Durbin, J. & Watson, G. S. (1971). Testing for serial correlation in least squares regression I I I . Biometrika, 58,
1a€"19.
Edwards, A . L. (1984). An introduction to linear regression and correlation (2nd ed.). New York: W. H.
Freeman & Company.
Eron, L . D., Huesman, L . R., Lefkowitz, M . M . , & Walder, L. O. (1972). Does television violence cause
aggression? American Psychologist, 27, 253a€"263.
Fernandez, G. (2003). Data mining using SAS applications. Boca Raton, FL: Chapman & Hall/CRC Press.
Field, A . (2005). Discovering statistics using SPSS (2nd ed.). London: Sage Publications.
Fisher, R. A. (1921). On the a€oeprobable errora€' of a coefficient of correlation deduced from a small sample.
Metron, 1, Part 4, 3a€"32.
Freedman, D. A. (2001). Ecological inference and the ecological fallacy. In Smelser, N . J. & Baltes, P. T. (Eds.),
International encyclopedia of the social and behavioral sciences, pp. 4027a€"4030. Amsterdam: Elsevier.
Goodwin, C. J. (2002). Research methods in psychology: Methods and design. (3rd ed.). New York: John
Wiley & Sons.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ:
Lawrence Erlbaum Associates.
Groebner, D. F., Shannon, P. W., Fry, P. C., & Smith, K. D. (2011). Business statistics: A decision making
approach (8th ed.). Saddle River, NJ: Prentice Hall.
Guenther, W. C. (1965). Concepts of statistical inference. New York: McGraw-Hill Book Company.
Guilford, J. P. (1965). Fundamental statistics in psychology and education (4th ed.). New York: McGraw-Hill
Book Company.
Hand, D. J. (2005). Data mining. In Everitt, B. S. and Howell, D. C. (Eds.), Encyclopedia of statistics in
behavioral science (pp. 461a€"465). Chichester, UK: John Wiley & Sons.
Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge, MA: M I T Press.
Hedges, L. V. & Olkin, I . (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Hershberger, S. L. (2005). Polychoric correlation. In Everitt, B. S. and Howell, D. C. (Eds.), Encyclopedia of
statistics in behavioral science (pp. 1553a€"1555). Chichester, UK: John Wiley & Sons.
Hildebrand, D. K. (1986). Statistical thinking for behavioral scientists. Boston, MA: Duxbury.
Hoagalin C. C. & Welsch, R. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32,
17a€"22.
Holt, C. C. (1957). Forecasting seasonal and trends by exponentially weighted moving averages. Office o f
Naval Research, Memorandum No. 52.
Hotelling, H. (1940). The selection o f variates for use in prediction with some comments on the general problem
of nuisance parameters. Annals of Mathematical Statistics, 11, 271a€"283.
Hotelling, H. (1953). New light on the correlation coefficient and its transforms. Journal of the Royal Statistical
Society, Series B, 15, 193a€"232.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury Press.
Howell, D. C. (2010). Statistical methods for psychology (7th ed.). Belmont, CA: Wadsworth.
Hunter, J. E. & Schmidt, F. L. (1987). Error in the meta-analysis of correlations: the mean correlation.
Unpublished manuscript, Department o f Psychology, Michigan State University.
Hunter, J. E. & Schmidt, F. L . (2004). Methods of meta-analysis: Correcting error and bias in research
findings (2nd ed.). Thousand Oaks, CA: Sage Publications.
Hunter, J. E., Schmidt, F. L . , & Coggin, T. D. (1996). Meta-analysis of correlation: Bias in the correlation
coefficient and the Fisher z transformation. Unpublished manuscript, University o f Iowa.
Kachigan, S. K. (1986). Statistical analysis. New York: Radius Press.
Kenny, D. A. (1973). Cross-lagged and synchronous common factors in panel data. In Goldenberger, A . S. &
Duncan, O. D. (Eds.), Structural equation models in the social sciences. New York: Seminar Press.
Kenny, D. A. (1975). Cross-lagged panel correlations: A test for spuriousness. Psychological Bulletin, 82,
887a€"903.
Kline, R. (2005). Principles and practices of structural equation modeling (2nd ed.). New York: Guilford.
Lazarsfeld, P. F. (1947). The mutual effects of statistical variables. Unpublished manuscript, Columbia
University, Bureau of Applied Social Research.
Lazarsfeld, P. F. (1948). The use o f panels in social research. Proceedings of the American Philosophical
Society, 92, 405a€"410.
Licht, M . H. (1995). Multiple regression and correlation. In Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and
understanding multivariate statistics (pp. 100a€"136). Washington, D.C.: American Psychological
Association.
Lin, C. C. & Mudholkar (1980). A simple test for normality against asymmetric alternatives. Biometrika, 67,
455a€"461.
Lindeman, R. H., Merenda, P. F., & Gold, R. Z. (1980). Introduction to bivariate and multivariate analysis.
Glenview, IL: Scott, Foresman & Company.
Ljung, G. M . & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65,
297a€"303.
Marascuilo, L. A. & Serlin, R. C. (1988). Statistical methods for the social and behavioral sciences. New York:
W. H. Freeman & Company.
McNemar, Q. (1969). Psychological statistics (4th ed.). New York: John Wiley & Sons.
Montgomery, D. C. & Peck, E. A . (1992). Introduction to linear regression analysis (2nd ed.). New York: John
Wiley & Sons.
Moore, D. S. & McCabe, G. P. (1993). Introduction to the practice of statistics (2nd ed.). New York: W. H.
Freeman & Company.
Moore, D. S., McCabe, G. P., Duckworth, W. M . , & Alwan, L . C. (2009). The practice of business statistics
(2nd ed.). New York: W. H. Freedman and Company.
Mosteller, F. (1990). Improving research methodology: A n overview. I n Sechrest, L . , Perrin, E., & Bunker, J.
(Eds.), Research methodology: Strengthening causal interpretation of nonexperimental data (pp.
221a€"230). Rockville, M D : U . S. Public Health Service Agency for Health Care Policy & Research.
Myers, J. L. & Well, A. D. (2003). Research design and statistical analysis (2nd ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Neale, J. M . & Liebert, R. M . (1980). Science and behavior: A n introduction to methods of research.
Englewood Cliffs, NJ: Prentice-Hall, Inc.
Neter, J., Kutner, M . H.,Nachtscheim, C. J. & Wasserman, W. (1996). Applied linear statistical models (4th ed.).
Boston: WCB McGraw-Hill.
Neter, J., Wasserman, W., & Kutner, M . H . (1983). Applied linear regression models (3rd ed.). Homewood, IL:
Richard D. Irwin.
Neter, J., Wasserman, W., & Kutner, M . H . (1990). Applied linear statistical models (3rd ed.). Homewood, I L :
Richard D. Irwin.
Newbold, P. & Bos, T. (1990). Introductory business forecasting. Cincinnati: South-Western Publishing Co.
Ozer, D. J. (1985). Correlation and the coefficient of determination. Psychological Bulletin, 97, 307a€"315.
Palumbo, D. J. (1977). Statistics in political and social science (Revised ed.). New York: Columbia University
Press.
Pearson, K. (1896). Mathematical contributions to the theory of evolution a€" I I I . Regression, heredity and
panmixia. Philosophical Transactions of the Royal Society of London, Series A , 187, 253a€"318.
Pearson, K. (1900). On the criterion that a given system o f deviations from the probable in the case o f a correlated
system o f variables is such that it can reasonably be supposed to have arisen in a random sampling.
Philosophical Magazine, 5, 157a€"175.
Pearson, K. (1901). On the correlation of characters not quantitatively measured. Philosophical Transactions of
the Royal Society, Series A , 195, 1a€"47.
Pelz, D. C. & Andrews, F. M . (1964). Detecting causal priorities in panel study data. American Sociological
Review, 29, 836a€"848.
Ramsey, F. L. & Schafer, D. W. (2002). The statistical sleuth: Course in methods of data analysis. Pacific
Grove, CA: Duxbury.
Rauma, D. & Berk, R. A . (1987). Remuneration and recidivism: The long-term impact o f unemployment
compensation on ex-offenders. Journal of Quantitative Criminology, 3, 3a€"27.
Robinson, W. S. (1950). Ecological correlations and the behavior o f individuals. American Sociological Review,
15, 351a€"357.
Rogosa, D. (1980). A critique of the cross-lagged correlation. Psychological Bulletin, 88, 245a€"258.
Rogosa, D. (1987). Causal models do not support scientific conclusions: A comment in support o f Freedman.
Journal of Educational Statistics, 12, 185a€"195.
Rozelle, R. M . & Campbell, D. T. (1969). More plausible rival hypotheses in the cross-lagged panel correlation
technique. Psychological Bulletin, 71, 74a€"80.
Ryan, T. P. (1997). Modern regression methods. New York: John Wiley & Sons.
Schmidt, J. W. & Taylor, R. E. (1970). Simulation and analysis of industrial systems. Homewood, IL: Richard
D. Irwin, Inc.
Schulze, R. (2004). Meta-analysis - A comparison of approaches. Toronto: Hogrefe & Huber.
Serlin, R. A . & Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle.
American Psychologist, 40, 73a€"83.
Serlin, R. A . & Lapsley, D. K. (1993). Rational appraisal o f psychological research and the good-enough
principle. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences:
Methodological issues (pp. 199a€"228). Hillsdale, NJ: Lawrence Erlbaum Associates.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for
generalized causal inference. Boston: Houghton Mifflin Company.
Shiffler, R. E. & Adams, A . J. (1995). Introductory business statistics with computer applications. Belmont,
CA: Duxbury.
Sinich, T. (1996). Business statistics by example. Upper Saddle River, NJ: Prentice Hall.
Smithson, M . (2003). Confidence Intervals. Thousand Oaks, CA: Sage Publications
Spata, A. V. (2003). Research methods: Science and diversity. New York: John Wiley & Sons.
StatSoft electronic statistics textbook (2010). Statsoft.com website.
Steiger, J. H . (1980). Tests for comparing elements o f a correlation matrix. Psychological Bulletin, 87,
245a€"251.
Stine, R. & Foster, D. (2011). Statistics for business decision making and analysis. Boston: Addison-Wesley.
Tabachnick, B. G. & Fidell, L. S. (1989). Using multivariate statistics. (2nd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L . S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Thuraisingham, B. (2001). Managing and mining multimedia databases. Boca Raton, FL: Chapman &
Hall/CRC Press.
Trochim, W. M . (2005). Research methods: The concise knowledge base. Cincinnati: Atomic Dog Publishing
Company.
van Belle, G. (2002). Statistical rules of thumb. New York: John Wiley & Sons.
Webster, A. L . (1995). Applied statistics for business and economics (2nd ed.). Chicago: Irwin.
White, H . L. (1980). A heteroscedasticity-consistent covariance matrix estimator and a direct test for
heteroscedasticity. Econometrika, 48, 817a€"838.
Wilcox, R. R., (2003). Applying contemporary statistical techniques. San Diego, CA: Academic Press.
Wilson, J. H . & Keating, B. (1990). Business forecasting. Homewood, IL: Irwin.
Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6,
324a€"342.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Brown, G. M . & Mood, A . M . (1951). On median tests for linear hypotheses. I n Neyman, J. (Ed.), Proceedings of
the Second Berkeley Symposium on Mathematical Statistics and Probability (pp. 159a€"166). Berkeley &
Los Angeles: The University o f California Press.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Franklin, L . A. (1996). Exact tables for Spearmana€™s rank correlation coefficient for n = 19 and n = 20.
Unpublished paper presented at the joint meetings, Aug 4a€"8, American Statistical Association, Chicago.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury.
Iman, R. L. & Conover, W. J. (1985). A measure of top-down correlation. Technical Report SAND85-0601,
Sandia National Laboratories, Albuquerque, New Mexico, 44 pp.
Iman, R. L. & Conover, W. J. (1987). A measure of top-down correlation. Technometrics 29, 351a€"357.
Correction: Technometrics, 1989, 31, 133.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Mood, A. M . (1950). Introduction to the theory of statistics. New York: McGraw-Hill Book Company.
Olds, E. G. (1938). Distribution o f sum o f squares o f rank differences for small numbers o f individuals. Annals of
Mathematical Statistics, 9, 133a€"148.
Olds, E. G. (1949). The 5% significance levels of sums of squares of rank differences and a correlation. Annals of
Mathematical Statistics, 20, 117a€"119.
Quade, D. & Salama, I . (1992). A survey of weighted rank correlation. In Sen, P. K. & Salama, I . (Eds.), Order
statistics and nonparametric theory and appliations (pp. 213a€"224). New York: Elsevier.
Ramsey, P. H . (1989). Critical values for Spearmana€™s rank order correlation. Journal of Educational
Statistics, 14, 245a€"253.
Salama, I . & Quade, D. (1981). A nonparametric comparison o f two multiple regressions by means o f a weighted
measure of correlation. Communic. Statisti. a€" Theor. Meth., A 1 1 , 1185a€"1195.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Spearman, C. (1904). The proof and measurement o f association between two things. American Journal of
Psychology, 15, 72a€"101.
Sprent, P. (1989). Applied nonparametric statistical methods. London: Chapman & Hall.
Sprent, P. (1993). Applied nonparametric statistical methods (2nd ed.). London: Chapman & Hall.
Theil, H . (1950). A rank-invariant method o f linear and polynomial regression analysis I I I . Nederl. Akad.
Wetensch. Proc., Series A, 53, 1397a€"1412.
Zar, J. H . (1972). Significance testing o f Spearman rank correlation coefficient. Journal of the American
Statistical Association, 67, 578a€"580.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Hollander, M . & Wolfe, D. A . (1999). Nonparametric statistical methods. New York: John Wiley & Sons.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury Press.
Kendall, M . G. (1938). A new measure of rank correlation. Biometrika, 30, 81a€"93.
Kendall, M . G. (1952). The advanced theory of statistics (Vol. 1). London: Charles Griffin & Co. Ltd.
Kendall, M . G. (1970). Rank correlation methods (4th ed.). London: Charles Griffin & Co. Ltd.
Lindeman, R. H., Meranda, P. F., & Gold, R. Z. (1980). Introduction to bivariate and multivariate analysis.
Glenview, IL: Scott, Foresman & Company.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Noether, G. E. (1967). Elements of nonparametric statistics. New York: John Wiley & Sons.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Sprent, P. (1989). Applied nonparametric statistics. London: Chapman & Hall.
Sprent, P. (1993). Applied nonparametric statistics (2nd ed.). London: Chapman & Hall.
Cohen, B. H. (2001). Explaining psychological Statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1980). Practical nonparametric statistics (2nd ed.). New York: John Wiley & Sons.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Friedman, M . (1940). A comparison of alternative tests of significance for the problem of m rankings. Annals of
Mathematical Statistics, 11, 86a€"92.
Kendall, M . G. (1970). Rank correlation methods (4th ed.). London: Charles Griffin & Co. Ltd.
Kendall, M . G. & Babington-Smith, B. (1939). The problem of m rankings. Annals of Mathematical Statistics,
10, 275a€"287.
Lindeman, R. H., Meranda, P. F., & Gold, R. Z. (1980). Introduction to bivariate and multivariate analysis.
Glenview, IL: Scott, Foresman & Company.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Sprent, P. (1989). Applied nonparametric statistics. London: Chapman & Hall.
Sprent, P. (1993). Applied nonparametric statistics (2nd ed.). London: Chapman & Hall.
Wallis, W. A. (1939). The correlation ratio for ranked data. Journal of the American Statistical Association, 34,
533a€"538.
Zar, J. H. (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall.
Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed.). Boston: PWS-Kent Publishing Company.
Davis, J. A. (1967). A partial coefficient for Goodman and Kruskala€™s gamma. Journal of the American
Statistical Association, 62, 189a€"193.
Goodman, L . A . & Kruskal, W. H. (1954). Measures o f association for cross-classification. Journal of the
American Statistical Association, 49, 732a€"764.
Goodman, L. A. & Kruskal, W. H. (1959). Measures o f association for cross-classification I I : Further discussion
and references. Journal of the American Statistical Association, 54, 123a€"163.
Goodman, L. A. & Kruskal, W. H. (1963). Measures o f association for cross-classification I I I : Approximate
sample theory. Journal of the American Statistical Association, 58, 310a€"364.
Goodman, L . A . & Kruskal, W. H . (1972). Measures of association for cross-classification IV: Simplification for
asymptotic variances. Journal of the American Statistical Association, 67, 415a€"421.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ:
Lawrence Erlbaum Associates.
Marascuilo, L . A . & McSweeney, M . (1977). Nonparametric and distribution-free methods for the social
sciences. Monterey, CA: Brooks/Cole Publishing Company.
Siegel, S. & Castellan, N . J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New
York: McGraw-Hill Book Company.
Somers, R. H. (1962). A new asymmetric measure o f association for ordinal variables. American Sociological
Review, 27, 799a€"811.
Field, A . (2005). Discovering statistics using SPSS (2nd ed.). Sage Publications: London.
Grimm, L . G. & Yarnold, P. R. (1995). Introduction to multivariate statistics. I n Grimm, L. G. & Yarnold, P. R.
(Eds.), Reading and understanding multivariate statistics (pp. 1a€"18). Washington, DC: American
Psychological Association.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Marascuilo, L . A . & Levin, J. R. (1983). Multivariate statistics in the social sciences: A researchers guide.
Monterey, CA: Brooks/Cole Publishing Company.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L . S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Thompson, B. T. (2000). Canonical correlation analysis. In Grimm, L . G. & Yarnold, P. R. (Eds.), Reading and
understanding more multivariate statistics (pp. 285a€"316). Washington, DC: American Psychological
Association.
Aiken, L. S. & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA:
Sage Publications.
Allen, M . P. (1997). Understanding regression analysis. New York: Plenum Press.
Barnett, V. & Lewis, T. (1994). Outliers in statistical data (3rd ed.). Chichester: John Wiley & Sons.
Belsey, D. A., Kuh, E. & Welsch, R. (1980). Regression diagnostics: Identifying influential data and sources of
collinearity. New York: John Wiley & Sons.
Berenson, M . L . & Levine, D.M. (1996). Basic business statistics: Concepts and applications. Englewood Cliffs,
NJ: Prentice Hall.
Bernstein, I . H., Garbin, C. P. & Teng, G. K. (1988). Applied multivariate analysis. New York: Springer-Verlag.
Bowerman, B. L . & Oa€™Connell, R. T. (1990). Linear statistical models: An applied approach (2nd ed.).
Belmont, CA: Duxbury.
Browne, M . W. (1975). Predictive validity of a linear regression equation. British Journal of Mathematical and
Statistical Psychology, 28, 79a€"87.
Cattin, P. (1980). Note on the estimation o f the squared cross-validated multiple correlation o f a regression model.
Psychological Bulletin, 87, 63a€"65.
Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70, 426a€"443.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155a€"159.
Cohen, J. & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd
ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the
behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Cook, R. D. & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.
Darlington, R. B. (1990). Regression and linear models. New York: McGraw-Hill.
Diekhoff, G. (1992). Statistics for the social and behavioral sciences: Univariate, bivariate, multivariate.
Dubuque, IA: Wm. C. Brown Publishers.
Field, A . (2005). Discovering statistics using SPSS (2nd ed.). London: Sage Publications.
Garson, D. G. (2006). Statistics: Topics in multivariate analysis. Website:
http://www2.chas.ncsu.edu/garson/pa765/statnote.htm.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F. & Black, W. C. (2000). Cluster analysis. I n Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and
understanding more multivariate statistics (pp. 147a€"205). Washington, DC: American Psychological
Association.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Harris, R. J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Hays, W. L. (1994). Statistics (5th ed.). Fort Worth: Harcourt Brace College Publishers.
Hoagalin, C. C. & Welsch, R. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32,
17a€"22.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury Press.
Kachigan, S. K. (1986). Statistical analysis. New York: Radius Press.
Kachigan, S. K. (1991). Multivariate statistical analysis (2nd ed.). New York: Radius Press.
Kline, R. (2005). Principles and practices of structural equation modeling (2nd ed.). New York: Guilford.
Licht, M . H. (1995). Multiple regression and correlation. I n Grimm, L . G. & Yarnold, P. R. (Eds.), Reading and
understanding multivariate statistics (pp. 100a€"136). Washington, D.C.: American Psychological
Association.
Lindeman, R. H., Merenda, P. F., & Gold, R. Z. (1980). Introduction to bivariate and multivariate analysis.
Glenview, IL: Scott, Foresman & Company.
Lord, R. & Novick, M . (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Marascuilo, L . A . & Levin, J. R. (1983). Multivariate statistics in the social sciences: A researchers guide.
Monterey, CA: Brooks/Cole Publishing Company.
Marascuilo, L . A . & Serlin, R. C. (1988). Statistical methods for the social and behavioral sciences. New York:
W. H. Freeman & Company.
Menard, S. W. (1995). Applied logistic regression analysis. Thousand Oaks, CA: Sage Publications.
Mertler, C. A . & Vannatta, R. A . (2005). Advanced and multivariate statistical methods (3rd ed.). Los Angeles:
Pyrczak Publications.
Miles, J. & Shevlin, M . (2001). Applying regression and correlation: A guide for students and researchers.
London: Sage Publications.
Montgomery, D. C. & Peck, E. A . (1992). Introduction to linear regression analysis (2nd ed.). New York: John
Wiley & Sons.
Myers, J. L . & Well, A. D. (2003). Research design and statistical analysis (2nd ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Neter, J., Kutner, M . H., Nachtscheim, C. J. & Wasserman, W. (1996). Applied linear statistical models (4th ed.).
Boston: WCB McGraw-Hill.
Neter, J., Wasserman, W., & Kutner, M . H. (1983). Applied linear regression models (2nd ed.). Homewood, I L :
Richard D. Irwin, Inc.
Neter, J., Wasserman, W., & Kutner, M . H. (1990). Applied linear statistical models (3rd ed.). Homewood, I L :
Richard D. Irwin, Inc.
NoruAjis, N . J. (2004). SPSS 13.0 advanced statistical procedures companion. Upper Saddle River, NJ: Prentice
Hall.
Pedhazur, E. J. (1982). Multiple regression in behavioral research: Explanation and prediction (2nd ed.). New
York: Holt, Rinehart & Winston.
Stevens, J. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L. S. (1989). Using multivariate statistics. (2nd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L . S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Tatsuoka, M . M . (1975). The general linear model: A new trend in analysis of variance. Champaign, IL: Institute
for Personality & Ability Testing.
Davis, D. S. (2002). Statistical methods for the analysis of repeated measurements. New York: Springer.
Geisser, S. & Greenhouse, S. W. (1958). A n extension of Boxa€™s results to the use of the F distribution in
multivariate analysis. Annals of Mathematical Statistics, 29, 885a€"891.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Harris, R. J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Hotelling, H. (1931). The generalization of Studenta€™s ratio. Annals of Mathematical Statistics, 2, 361a€"378.
Huynh, H. & Feldt, L. S. (1976). Estimates o f the correction for degrees o f freedom for sample data in randomized
block and split-plot designs. Journal of Educational Statistics, 1, 69a€"82.
Kachigan, S. K. (1986). Statistical analysis. New York: Radius Press.
Marascuilo, L . A . & Levin, J. R. (1983). Multivariate statistics in the social sciences: A researchers guide.
Monterey, CA: Brooks/Cole Publishing Company.
Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. Annals of Mathematical
Statistics, 11, 204a€"209.
Maxwell, S. E. & Delaney, H . D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Myers, J. L . & Well, A. D. (2003). Research design and statistical analysis (2nd ed.). Mahwah, NJ: Lawrence
Erlbaum Associates.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L . S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Zar, J. H . (1999). Biostatistical analysis (4th ed.). Upper Saddle River, NJ: Prentice Hall. Endnotes
Baringhaus, L . , Danschke, R., & Henze, N . (1989). Recent and classical tests for normality a€" a comparative
study. Communication in Statistics - Simulation, 18, 363a€"379.
Bartlett, M . S. (1939). A note on tests of significance in multivariate analysis. Proceedings of the Cambridge
Philosophical Society, 35, 180a€"185.
Bruning, J. L . & Kintz, B. L . (1997). Computational handbook of statistics (4th ed.). New York: Addison-
Wesley Longman.
Cole, D. A., Maxwell, S. E., Arvey, R. & Salas, E. (1994). How the power of the M A N O V A can both increase and
decrease as a function o f the intercorrelations among the dependent variables. Psychological Bulletin, 115,
465a€"474.
Field, A . (2005). Discovering statistics using SPSS (2nd ed.). London: Sage Publications.
George, D. & Mallery, P. (2001). SPSS for Windows: Step by Step (3rd ed.). Boston: Allyn & Bacon.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Harris, R. J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Hotelling, H . (1931). The generalization of Studenta€™s ratio. Annals of Mathematical Statistics, 2, 361a€"378.
Koziol, J. A. (1986). Assessing multivariate normality: A compendium. Communication in Statistics a€"
Theory and Methods, 15, 2763a€"2783.
Lawley, D. N . (1938). A generalization of Fishera€™s z test. Biometrika, 30, 180a€"187.
Marascuilo, L . A . & Levin, J. R. (1983). Multivariate statistics in the social sciences: A researchers guide.
Monterey, CA: Brooks/Cole Publishing Company.
Maxwell, S. E. & Delaney, H . D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Mertler, C. A . & Vannatta, R. A. (2005). Advanced and multivariate statistical methods (3rd ed.). Los Angeles:
Pyrczak Publications.
Pillai, K. C. S. (1955). Some new test criteria in multivariate analysis. Annals of Mathematical Statistics, 26,
117a€"121.
Roy, S. N . (1945). The individual sampling distribution o f the maximum, minimum, and any intermediates o f the
p-statistics on the null-hypothesis. SankhyA* , 7, 133a€"158.
Roy, S. N . (1953). On a heuristic method o f test construction and its use in multivariate analysis. Annals of
Mathematical Statistics, 24, 220a€"238.
Silva, A. P. D. & Stam, A. (1995). Discriminant analysis. In Grimm, L . G. & Yarnold, P. R. (Eds.), Reading and
understanding multivariate statistics (pp. 277a€"318). Washington, DC: American Psychological
Association.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Weinfurt, K. P. (1995). Multivariate analysis o f variance. In Grimm, L . G. & Yarnold, P. R. (Eds.), Reading and
understanding multivariate statistics (pp. 245a€"276). Washington, DC: American Psychological
Association.
Wilks, S. S. (1932). Certain generalizations in the analysis o f variance. Biometrika, 24, 471a€"494.
Grimm, L . G. & Yarnold, P. R. (Eds.) (1995). Reading and understanding multivariate statistics. Washington,
DC: American Psychological Association.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Huitema, B. (1980). The analysis of covariance and alternatives. New York: John Wiley & Sons.
Mertler, C. A . & Vannatta, R. A . (2005). Advanced and multivariate statistical methods (3rd ed.). Los Angeles:
Pyrczak Publications.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Weinfurt, K. P. (1995). Multivariate analysis o f variance. In Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and
understanding more multivariate statistics (pp. 245a€"276). Washington, DC: American Psychological
Association.
Diekhoff, G. (1992). Statistics for the social and behavioral sciences: Univariate, bivariate, and multivariate.
Dubuque, IA: Wm. C. Brown Publishers.
Field, A . (2005). Discovering statistics using SPSS (2nd ed.). London: Sage Publications.
Grimm, L . G. & Yarnold, P. R. (Eds.) (1995). Reading and understanding multivariate statistics. Washington,
DC: American Psychological Association.
Hair, J. F., Anderson, R. E., Tatham, R. L,. & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Huberty, C. J. (1994). Applied discriminant analysis. New York: John Wiley & Sons.
Marascuilo, L . A . & Levin, J. R. (1983). Multivariate statistics in the social sciences: A researchers guide.
Monterey, CA: Brooks/Cole Publishing Company.
Silva, A. P. D. & Stam, A. (1995). Discriminant analysis. In Grimm, L . G. & Yarnold, P. R. (Eds.), Reading and
understanding multivariate statistics (pp. 277a€"318). Washington, DC: American Psychological
Association.
Spicer, J. N . (2005). Making sense of multivariate data analysis. Thousand Oaks, CA: Sage Publications.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Tatsuoka, M . M . (1973). Multivariate analysis in behavioral research. I n Kerlinger, F. (Ed.). Review of research
in education. Itasca, IL: Peacock.
Campbell, K. T. & Taylor, D. L. (1996). Canonical correlational analysis as a general linear model: A heuristic
lesson for teachers and students. Journal of Experimental Education, 64, 157a€"171.
Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70, 426a€"443.
Cramer, E. & Nicewander, W. A . (1979). Some symmetric, invariant measures o f multivariate association.
Psychometrika, 44, 43a€"54.
Garson, D. G. (2005). Statistics: Topics in multivariate analysis. Website:
http://www2.chas.ncsu.edu/garson/pa765/statnote.htm.
Grimm, L. G. & Yarnold, P. R. (Eds.) (2000). Reading and understanding more multivariate statistics.
Washington, DC: American Psychological Association.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Harris, R. J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Horst, P. (1961). Generalized canonical correlations and their applications to experimental data. Journal of
Clinical Psychology, 26, 331a€"347.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA: Duxbury Press.
Kachigan, S. K. (1986). Statistical analysis. New York: Radius Press.
Knapp, T. R. (1978). Canonical correlational analysis: A general parametric significance testing system.
Psychological Bulletin, 85, 410a€"416.
Marascuilo, L . A . & Levin, J. R. (1983). Multivariate statistics in the social sciences: A researchers guide.
Monterey, CA: Brooks/Cole Publishing Company.
Miller, J. K. (1975). The sampling distribution and a test o f significance o f the bimultivariate redundancy statistic:
A Monte Carlo study. Multivariate Behavior Research, 10, 233a€"244.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Stewart, D. K. & Love, W. A . (1968). A general canonical correlation index. Psychological Bulletin, 70,
160a€"163.
Tabachnick, B. G. & Fidell, L . S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Tacq, J. (2004). Canonical correlational analysis. In Lewis-Beck, M . S., Bryman, A . & Liao, T. F. (Eds.), The
Sage encyclopedia of social science research methods (Vol 1) (pp. 83a€"86). Thousand Oaks, CA: Sage
Publications.
Thompson, B. T. (1984). Canonical correlational analysis: Uses and interpretation. Newbury Park, CA: Sage
Publications.
Thompson, B. T. (2000). Canonical correlational analysis. I n Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and
understanding more multivariate statistics (pp. 285a€"316). Washington, DC: American Psychological
Association.
Trochim, W. M . (2005). Research methods: The concise knowledge base. Cincinnati: Atomic Dog Publishing
Company.
Websterd€™s new collegiate dictionary (1981). Springfield, MA: G. & C. Merriman Co.
Wuensch, K. (2005). Statistics lessons. Website: http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Mv.htm.
Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons.
Aldrich, J. H. & Nelson, F. D. (1984). Linear probability, logit, and probit models. Beverly Hills, CA: Sage
Publications.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155a€"159.
Field, A. (2005). Discovering statistics using SPSS (2nd ed.). London: Sage Publications.
Garson, D. G. (2006). Statistics: Topics in multivariate analysis. Website:
http://www2.chas.ncsu.edu/garson/pa765/statnote.htm.
George, D. and Mallery, P. (2005). SPSS for Windows step by step: A simple guide and reference, 12.0 update (5th
ed.). Boston: Pearson Education.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Hauck, W. W. & Donner, A. (1977). Walda€™s test as applied to hypotheses in logit analysis. Journal of the
American Statistical Association, 72, 851a€"853.
Hosmer, D. W. & Lemeshow, S. (1989). Applied logistic regression. New York: John Wiley & Sons.
Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: John Wiley & Sons.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Grove, CA.: Duxbury.
Lipsey, M . W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage Publications.
Menard, S. W. (2002). Applied logistic regression analysis (2nd ed.). Thousand Oaks, CA: Sage Publications.
Mertler, C. A . & Vannatta, R. A . (2005). Advanced and multivariate statistical methods (3rd ed.). Los Angeles:
Pyrczak Publications.
Nagelkerke, N . J. (1991). A note on a general definition o f the coefficient o f determination. Biometrika, 78,
691a€"692.
NoruAjis, N . J. (2004). SPSS 13.0 advanced statistical procedures companion. Upper Saddle River, NJ: Prentice
Hall.
Pagano, M . & Gauvreau, K. (1993). Principles of biostatistics. Belmont, CA: Duxbury Press.
Peduzzi, P. N . , Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. (1996). A simulation study of the number
of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 99, 1373a€"1379.
Rosner, B. (1995). Fundamentals of biostatistics (4th ed.). Belmont, CA: Duxbury Press.
Rosner, B. (2000). Fundamentals of biostatistics (5th ed.). Pacific Grove, CA: Duxbury Press.
Selvin, S. (1995). Practical biostatistical methods. Belmont, CA: Duxbury.
Spicer, J. (2005). Making sense of multivariate data analysis. Thousand Oaks, CA: Sage Publications.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
van Belle, G. (2002). Statistical rules of thumb. New York: John Wiley & Sons.
Wright, R. E. (1995). Logistic regression. In Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and understanding
multivariate statistics (pp. 245a€"276). Washington, DC: American Psychological Association.
Wuensch, K. (2005). Statistics lessons. Website: http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Mv.htm.
Bartlett, M . S. (1954). A note on the multiplying factors for various chi-square approximations. Journal of the
Royal Statistical Society, 16 (Series B), 296a€"298.
Blashfield, R. K. & Aldenderfer, M . S. (1984). Cluster analysis (4th ed.). Beverly Hills, CA: Sage Publications.
Borg, I . & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications (2nd ed.). New
York: Springer.
Bryant, F. B. & Yarnold, P. R. (1995). Principal-components analysis and exploratory and confirmatory factor
analysis. In Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and understanding multivariate statistics (pp.
100a€"136). Washington, DC: American Psychological Association.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245a€"266.
Cliff, N . & Hamburger, C. D. (1967). The study o f sampling errors in factor analysis by means o f artificial
experiments. Psychological Bulletin, 68, 430a€"445.
Cox, T. F. & Cox, M . A . (2001). Multidimensional scaling (2nd ed.) Boca Raton, FL: Chapman & Hall.
Diekhoff, G. (1992). Statistics for the social and behavioral sciences: Univariate, bivariate, and multivariate.
Dubuque, IA: Wm. C. Brown Publishers.
Field, A . (2005). Discovering statistics using SPSS (2nd ed.). London: Sage Publications.
Garson, D. G. (2006). Statistics: Topics in multivariate analysis. Website:
http://www2.chas.ncsu.edu/garson/pa765/statnote.htm.
George, D. & Mallery, P. (2001). SPSS for Windows: Step by Step (3rd ed.). Boston: Allyn & Bacon.
Grimm, L. G. & Yarnold, P. R. (Eds.) (1995). Reading and understanding multivariate statistics. Washington,
DC: American Psychological Association.
Grimm, L. G. & Yarnold, P. R. (Eds.) (2000). Reading and understanding more multivariate statistics.
Washington, DC: American Psychological Association.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L. & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F. & Black, W. C. (2000). Cluster analysis. I n Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and
understanding more multivariate statistics (pp. 147a€"205). Washington, DC: American Psychological
Association.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Jolliffe, I . T. (1972). Discarding variables in a principal components analysis, I : artificial data. Applied Statistics,
21, 160a€"163.
Jolliffe, I . T. (1986). Principal components analysis. New York: Springer-Verlag.
Kachigan, S. K. (1986). Statistical analysis. New York: Radius Press.
Kachigan, S. K. (1991). Multivariate statistical analysis (2nd ed.). New York: Radius Press.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological
Measurement, 20, 140a€"151.
Kaiser, H . F. (1970). A second generation Little Jiffy. Psychometrika, 35, 401a€"415.
Kaiser, H . F. (1974). A n index of factorial simplicity. Psychometrika, 39, 31a€"36.
Klem, L . (1995). Path analysis. I n Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and understanding multivariate
statistics (pp. 65a€"97). Washington, DC: American Psycho-logical Association.
Kruskal, J. B. & Wish, M . (1978). Multidimensional scaling. Beverly Hills, CA: Sage Publications.
Marascuilo, L . A . & Levin, J. R. (1983). Multivariate statistics in the social sciences: A researchers guide.
Monterey, CA: Brooks/Cole Publishing Company.
Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. Annals of Mathematical
Statistics, 11, 204a€"209.
Mertler, C. A . & Vannatta, R. A . (2005). Advanced and multivariate statistical methods (3rd ed.). Los Angeles:
Pyrczak Publications.
NoruAjis, N . J. (2004). SPSS 13.0 advanced statistical procedures companion. Upper Saddle River, NJ: Prentice
Hall.
Pedhazur, E. & Schmelkin, L. (1991). Measurement design and analysis. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Spicer, J. N . (2005). Making sense of multivariate data analysis. Thousand Oaks, CA: Sage Publications.
Stalans, L. J. (1995). Multidimensionl scaling. I n Grimm, L . G. & Yarnold, P. R. (Eds.), Reading and
understanding multivariate statistics (pp. 137a€"168). Washington, DC: American Psychological Association.
Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Tabachnick, B. G. & Fidell, L . S. (1996). Using multivariate statistics (3rd ed.). New York: Harper Collins
Publishers.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Thompson, B. (2000). Q-technique factor analysis: One variation on the two-mode factor analysis of variables. I n
Grimm, L. G. & Yarnold, P. R. (Eds.). Reading and understanding more multivariate statistics (pp.
207a€"226). Washington, DC: American Psychological Association.
Bentler, P. M . (1990). Comparative fit indexes in structural equation models. Psychological Bulletin, 107,
238a€"246.
Bentler, P. M . (2004). EQS structural equations program manual. Encino, CA: Multivariae Software Inc.
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley and Sons.
Byrne, B. M . (2009). Structural equation modeling with AMOS: Basic concepts, applications and programming
(2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Fabrigar, L . R. and Wegener, D. T. (2009). Structural equation modeling. In Stevens, J. P. (pp. 537a€"582),
th
Applied multivariate statistics for the social sciences ( 5 ed). New York: Routledge.
Freedman, D. A . (2005). Statistical models: Theory and practice. New York: Cambridge University Press.
Grimm, L. G. & Yarnold, P. R. (Eds.) (2000). Reading and understanding more multivariate statistics.
Washington, DC: American Psychological Association.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
JA^reskog, K. G. and SA^rbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command
language. Hillsdale, NJ: Lawrence Erlbaum Associates.
Kenny, D. A . (1979). Correlation of causality. New York: Wiley-Interscience.
Klem, L . (1995). Path analysis. I n Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and understanding multivariate
statistics (pp. 65a€"97). Washington, DC: American Psycho-logical Association.
Kline, R. (2005). Principles and practices of structural equation modeling (2nd ed.). New York: Guilford.
Knoke, D. (1985). A path analysis primer. In S. B. Smith (Ed.). A handbook of social science methods (Vol. 3, pp.
390a€"407). New York: Praeger.
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural analysis (4th ed.).
Mahwah, NJ: Lawrence Erlbaum Associates.
Pedhazur, E. J. (1982). Multiple regression in behavioral research (2nd ed.). New York: Holt, Rinehart &
Winston.
Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd ed.). Fort Worth: Harcourt Brace. College
Publishers.
Raykov, T. & Marcoulides, G. A . (2000). A first course in structural equation modeling. Mahwah, NJ: Lawrence
Erlbaum Associates.
Raykov, T. & Marcoulides, G. A . (2006). A first course in structural equation modeling (2nd ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
Rigdon. E. E. (2005) Structural equation modeling: Software. I n Everitt, B. S. and Howell, D. C. (Eds.) (pp.
1947a€"1951), Encyclopedia of statistics in behavioral science. New York: John Wiley and Sons.
Rodgers, J. L . (2010). The epistemology o f mathematical and statistical modeling: A quiet methodological
revolution. American Psychologist, 65, 1a€"12.
Romney, D. M . , Jenkins, C. D., & Byner, J. M . (1992). A structural analysis of health-related quality of life
dimensions. Human Relations, 45, 165a€"176.
M
Schumacker, R. E. & Lomax, R. G. (1996). A beginnerd€ s guide to structural equation modeling. Mahwah, NJ:
Lawrence Erlbaum Associates.
Specht, D. (1975). On the evaluation of causal model. Social Science Research, 4, 113a€"133.
Streiner, D. L. (2005). Finding our way: A n introduction to path analysis. Canadian Journal of Psychiatry, 50,
115a€"122.
Tabachnick, B. G. & Fidell, L . S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn a€] Bacon.
Ullman, J. B. (2001). Structural equation modeling. In Tabachnick, B. G. & Fidell, L. S., Using multivariate
statistics (4th ed.) (pp. 653a€"671). Boston: Allyn & Bacon.
Warner, R. M . (2008). Applied statistics: From bivariate through multivariate techniques (2008). Los Angele:
Sage Publicatons.
Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557a€"585.
Wuensch, K. (2011) Statistics:Topics in multivariateanalysis. Website:
http://core.edu/psyc/wuenschk/Statlessons.htm
Adorno, T. W., Frenkel-Brunswick, E., Levinson, D. J., & Sanford, R. N . (1950). The authoritarian personality.
New York: Harper and Row.
Bentler, P. M . (1990). Comparative fit indexes in structural equation models. Psychological Bulletin, 107,
238a€"246.
Bentler, P. M . (2004). EQS structural equations program manual. Encino, CA: Multivariae Software Inc.
Bentler, P. M . and Bonnet, D. (1980). Significance tests and goodness o f fit in the analysis o f covariance
structures. Psychological Bulletin, 88, 588a€"606.
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley and Sons.
Bollen, K. A. and Long, J. S. (Eds.) (1993). Testing structural equation models. Newbury Park, CA: Sage.
Byrne, B. M . (2009). Structural equation modeling with AMOS: Basic concepts, applications and programming
(2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Chou, C. P. and Bentler, P. M . (1993). Invariant standardized estimated parameter change for model modification
in covariance structure analysis. Multivariate Behavioral Research, 28, 97a€"110.
Fabrigar, L. R. and Wegener, D. T. (2009). Structural equation modeling. In Stevens, J. P.. Applied multivariate
statistics for the social sciences (5th ed) (pp. 537a€"582). New York: Routledge.
Freedman, D. A . (2005). Statistical models: Theory and practice. New York: Cambridge University Press.
Garson, D. G. (2011). Statistics: Topics in multivariate analysis. Website:
http://www2.chas.ncsu.edu/garson/pa765/statnote.htm.
Grimm, L. G. & Yarnold, P. R. (Eds.) (2000). Reading and understanding more multivariate statistics.
Washington, DC: American Psychological Association.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (1995). Multivariate data analysis (4th ed.). Upper
Saddle River, NJ: Prentice Hall.
Hair, J. F., Anderson, R. E., Tatham, R. L . , & Black, W. C. (2004). Multivariate data analysis (6th ed.). Upper
Saddle River, NJ: Prentice Hall.
Harlow, L. L. (2005). The essence of multivariate thinking. Mahwah, NJ: Lawrence Erlbaum Associates.
Ho, R. (2006). Handbook of univariate and multivariate data analysis and interpretation with SPSS. Boca Raton,
FL: Chapman and Hall/CRC.
JA^reskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis.
Psychometrika, 34, 183a€"220.
JA^reskog, K. G. (1993). Testing structural equation models. I n K. A . Bollen & J. S. Lang (Eds.). Testing
structural equation models (pp. 294a€"316). Newbury Park, CA: Sage.
JA^reskog, K. G. and SA^rbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command
language. Hillsdale, NJ.: Lawrence Erlbaum Associates.
Klem, L . (1995). Path analysis. I n Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and understanding multivariate
statistics (pp. 65a€"97). Washington, DC: American Psychological Association.
Klem, L . (2000). Structural equation modeling. I n Grimm, L. G. & Yarnold, P. R. (Eds.), Reading and
understanding more multivariate statistics (pp. 227a€"260). Washington, DC: American Psychological
Association.
Kline, R. (2005). Principles and practices of structural equation modeling (2nd ed.). New York: Guilford.
Knoke, D. Bohmstedt, G. W., & Potter-Mee, A . (2002). Statistics for social data analysis (4th ed.). Belmont, CA:
Thomsom/Wadsworth.
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural analysis (4th ed.).
Mahwah, NJ: Lawrence Erlbaum Associates.
Pedhazur, E. J. (1982). Multiple regression in behavioral research (2nd ed.). New York: Holt, Rinehart &
Winston.
Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd ed.). Fort Worth: Harcourt Brace College
Publishers.
Raykov, T. & Marcoulides, G. A . (2000). A first course in structural equation modeling. Mahwah, NJ: Lawrence
Erlbaum Associates.
Raykov, T. & Marcoulides, G. A . (2006). A first course in structural equation modeling (2nd ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
Rigdon. E. E. (2005). Structural equation modeling: Software. I n Everitt, B. S. and Howell, D. C. (Eds.),
Encyclopedia of statistics in behavioral science (pp. 1947a€"1951). New York: John Wiley and Sons.
Rodgers, J. L . (2010). The epistemology o f mathematical and statistical modeling: A quiet methodological
revolution. American Psychologist, 65, 1a€"12.
Schreiber, J. B., Stage, F. K., King, J., Nora, A., & Barlow, E. A. (2006). Reporting structural equation modeling
and confirmatory factor analysis results: A review. The Journal of Educational Research, 99, 323a€"337.
Schumacker, R. E. (2005). Structural equation modeling: Overview. In Everitt, B. S. and Howell, D. C. (Eds.),
Encyclopedia of statistics in behavioral science (pp. 1941a€"1947). New York: John Wiley and Sons.
M
Schumacker, R. E. & Lomax, R. G. (1996). A beginnerd€ s guide to structural equation modeling. Mahwah, NJ:
Lawrence Erlbaum Associates.
M
Schumacker, R. E. & Lomax, R. G. (2004). A beginnerd€ s guide to structural equation modeling (2nd ed.).
Mahwah, NJ: Lawrence Erlbaum Associates.
Streiner, D. L. (2005). Building a better model: A n introduction to structural equation modeling. Canadian
Journal of Psychiatry, 51, 317a€"324.
Tabachnick, B. G. & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston: Allyn & Bacon.
Tenko, R. and Marcoulides, G. A . (2006). A first course in structural equation modeling (2nd ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
Thompson, B. (2000). Ten commandments o f structural equation modeling. In Grimm, L . G. & Yarnold, P. R.
(Eds.), Reading and understanding more multivariate statistics (pp. 261a€"283). Washington, DC: American
Psychological Association.
Ullman, J. B. (1996). Structural equation modeling. In Tabachnick, B. G. & Fidell, L. S., Using multivariate
statistics (3ed.) (pp. 709a€"811). New York: Harper Collins.
Ullman, J. B. (2001). Structural equation modeling. In Tabachnick, B. G. & Fidell, L. S.. Using multivariate
statistics (4th ed.) (pp. 653a€"671). Boston: Allyn & Bacon.
Warner, R. M . (2008). Applied statistics: From bivariate through multivariate techniques. Los Angeles: Sage
Publications.
Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557a€"585.
Wuensch, K. (2011) Statistics: Topics in multivariate analysis. Website:
Http://core.edu/psyc/wuenschk/Statlessons.htm.
Aaron, B., Kromrey, J. D., & Ferron, J. (1998, November). Equating r-based and d-based effect size indices:
Problems with a commonly recommended formula. Paper presented at the annual meeting of the Florida
Educational Research Association, Orlando, FL (ERIC Document Reproduction Service No. ED 433353).
Antman, E. M . , Lau, J., Kupelnick, B., Mosteller, F., & Chalmers, T, C. (1992). A comparison of the results of
meta-analysis of randomized control trials and recommendations of clinical experts. Journal of the American
Medical Association, 268, 240a€"248.
Birnbaum, A. (1954). Combining independent tests o f significance. Journal of the American Statistical
Association, 49, 554a€"574.
Borenstein, M . , Hedges, L . V., Higgins, J. T., & Rothstein, H. R. (2009). Introduction to Meta-analysis. New
York: John Wiley & Sons.
Chalmers, T. C., Berrier, J., Sack, H. S., Levin, H., Reitman, D., & Nagalingam, R. (1987). Meta-analysis of
clinical trials as a scientific discipline. Statistics in Medicine, 6, 733a€"744.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997a€"1003.
Cohen, B. H. (2001). Explaining psychological statistics (2nd ed.). New York: John Wiley & Sons.
Cochran, W. G. (1954). The combination of estimates from different experiments. Biometrics, 10, 101a€"129.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). New York: John Wiley & Sons.
Everitt, B. S. (1977). The analysis of contingency tables. New York: Chapman & Hall.
Everitt, B. S. (1992). The analysis of contingency tables (2nd ed.). New York: Chapman & Hall.
Falk, R. W. & Greenbaum, C. W. (1995). Significance tests die hard. Theory and Psychology, 5, 75a€"98.
Field, A . (2001). Meta-analysis o f correlation coefficients: A Monte Carlo comparison o f fixed- and random-
effects methods. Psychological Methods, 6, 161a€"180.
Gigerenzer, G. (1993). The superego, the ego and the id in statistical reasoning. In Keren, G. and Lewis, C. (Eds.).
A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311a€"339).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Glass, G. (1976). Primary, secondary and meta-analysis of research. Educational Researcher, 5, 3a€"8.
Glass, G. (1977). Integrating findings: The meta-analysis o f research. Review of Research in Education, 5,
351a€"379.
Glass, G., McGaw, B., & Smith, M . L . (1981). Meta-analysis in Social Research. Beverly Hills, CA: Sage
Publications.
Grissom, R. J. & K i m , J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ:
Lawrence Erlbaum Associates.
Hagen, R. L . (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15a€"24.
Harlow, L . L., Mulaik, S. A., & Steiger, J. H . (Eds.) (1997). What if there were no significance tests? Mahwah,
NJ: Lawrence Erlbaum Associates.
Hedges, L . V. (1981). Distribution theory for Glassa€™s estimator of effect size and related estimators. Journal
of Educational Statistics, 6, 107a€"128.
Hedges, L. V. (1982). Estimation o f effect size from a series o f independent experiments. Psychological Bulletin,
92, 490a€"499.
Hedges, L. V. & Olkin, I . (1980). Vote counting methods in research synthesis. Psychological Bulletin, 88,
359a€"369.
Hedges, L. V. & Olkin, I . (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Hedges, L. V. & Vevea, J. L . (1998). Fixed- and random-effects models in meta-analysis. Psychological
Methods, 3, 486a€"504.
Higgins, J., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analysis.
B J M , 327, 557a€"560.
Hsu, L. M . (2004). Biases o f success rate differences shown in binomial effect size displays. Psychological
Bulletin, 9, 183a€"197.
Hudeo-Medina, T. B., SAjnchez-Meca, J., & Marin-Martinez, F. (2006). Assessing heterogeneity in meta-
analysis: Q statistic or I index? Psychological Methods, 11, 193a€"206.
2

Hunt, M . (1997). How science takes stock. New York: Russell Sage Foundation.
Hunter, J. E. & Schmidt, F. L . (1987). Error in the meta-analysis of correlations: the mean correlation.
Unpublished manuscript, Department o f Psychology, Michigan State University.
Hunter, J. E. & Schmidt, F. L . (1990). Methods of meta-analysis: Correcting error and bias in research
findings (1st ed.). Newbury Park, CA: Sage Publications.
Hunter, J. E. and Schmidt, F. L. (2000). Fixed effects vs. random effects meta-analysis models: Implications for
cumulative knowledge in psychology. International Journal of Selection and Assessment, 8, 275a€"292.
Hunter, J. E. & Schmidt, F. L . (2004). Methods of meta-analysis: Correcting error and bias in research
findings (2nd ed.). Thousand Oaks, CA: Sage Publications.
Hunter, J. E., Schmidt, F. L . , & Coggin, T. D. (1996). Meta-analysis of correlation: Bias in the correlation
coefficient and the Fisher z transformation. Unpublished manuscript, University of Iowa.
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research.
Washington, DC: American Psychological Association.
Kraemer, H . C., Gardner, C., Brooks, J. O., & Yesavage, J. A. (1998). Advantages of excluding underpowered
studies in meta-analysis: Inclusionist versus exclusionist viewpoints. Psychological Methods, 3, 23a€"31.
Krueger, J. (2001). Null hypothesis significance testing. American Psychologist, 56, 16a€"26.
Light, R. J., Singer, J. D., & Willett, J. B. (1994). The visual presentation and interpretation of meta-analysis. In
Cooper, H . and Hedges, L . V. (Eds.). The handbook of research synthesis (pp. 439a€"453). New York:
Russell Sage Foundation.
Lipsey, M . W. & Wilson, D. B. (1993). The efficacy o f psychological, educational, and behavioral treatment:
Confirmation from meta-analysis. American Psychologist, 48, 1181a€"1209.
Lipsey, M . W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage Publications.
Meehl, P. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress o f soft
psychology. Journal of Consulting and Clinical Psychology, 46, 806a€"834.
Morrison, D. E. & Henkel, R. E. (Eds.) (1970). The significance test controversy: A reader. Chicago: Aldine.
Mosteller, F. M . & Bush, R. R. (1954). Selected quantitative techniques. I n Lindzey, G. (Ed.), Handbook of
social psychology: Volume 1. Theory and method (pp. 289a€"334). Cambridge, MA: Addison-Wesley.
Mullen, B. & Rosenthal, R. (1985). B A S I C meta-analysis: Procedures and programs. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Murphy, K. R. & Myors, B. (1998). Statistical power analysis: A simple and general model for traditional
and modern hypothesis tests. Hillsdale, NJ: Lawrence Erlbaum Associates.
Murphy, K. R. & Myors, B. (2004). Statistical power analysis: A simple and general model for traditional
and modern hypothesis tests (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Orwin, R. G. (1983). A fail-safe N for effect sizes in meta-analysis. Journal of Educational Statistics, 8,
157a€"159.
Rosenthal, R. (1978). Combining results of independent studies. Psychological Bulletin, 85, 185a€"193.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86,
638a€"641.
Rosenthal, R. (1985). Basic meta-analysis: Procedures and programs. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park, CA: Sage Publications.
Rosenthal, R. (1993). Cumulating evidence. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in
the behavioral sciences: Methodological issues (pp. 519a€"559). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Rosenthal, R. (1994). Parametric measures o f effect size. I n Cooper, H . M . & Hedges, L. V. (Eds.), The
handbook of research synthesis (pp. 231a€"244). New York: Russell Sage Foundation.
Rosenthal, R., Rosnow, R. L . , & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A
correlational approach. Cambridge, UK: Cambridge University Press.
Rosenthal, R. & Rubin, D. B. (1978). Interpersonal expectancy effects: The first 345 studies. The Behavioral and
Brain Sciences, 3, 377a€"415.
Rosenthal, R. & Rubin, D. B. (1982). A simple general purpose display o f magnitude o f experimental effect.
Journal of Educational Psychology, 74, 166a€"169.
Rosnow, R. L. & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological
science. American Psychologist, 44, 1276a€"1284.
Rosnow, R. L . , Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation.
Psychological Science, 11, 446a€"453.
Schmidt, F. L. & Hunter, J. E. (1977). Development o f a general solution to the problem o f validity generalization.
Journal of Applied Psychology, 62, 529a€"540.
Schulze, R. (2004). Meta-analysis: A comparison of approaches. Toronto: Hogrefe & Huber.
Schulze, R., Holling, H., & Bohning, D. (Eds.) (2003). Meta-analysis: New developments and applications in
medical and social sciences. Toronto: Hogrefe & Huber.
Serlin, R. A . & Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle.
American Psychologist, 40, 73a€"83.
Serlin, R. A . & Lapsley, D. K. (1993). Rational appraisal o f psychological research and the good-enough
principle. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences:
Methodological issues (pp. 199a€"228). Hillsdale, NJ: Lawrence Erlbaum Associates.
Smith, M . L. & Glass, G. V. (1977). Meta-analysis of psychotherapy outcome studies. American Psychologist,
32, 752a€"760.
Steering Committee of the Physiciansa€™ Health Study Research Group (1988). Preliminary report: Findings
from the aspirin component of the ongoing physiciansa€™ health study. The New England Journal of
Medicine, 318, 262a€"264.
Stokes, D. M . (2001). The shrinking file drawer: On the validity of statistical meta-analyses in parapsychology.
The Skeptical Inquirer, 25(3), May/June 2001, 22a€"25.
Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A., & Williams, Jr., R. M . (1949). The American
soldier: Adjustment during army life: Volume 1. Princeton, NJ: Princeton University Press.
Tatsuoka, M . (1993). Effect size. I n Keren, G. & Lewis, C. (Eds.). A handbook for data analysis in the
behavioral sciences: Methodological issues (pp. 461a€"479). Hillsdale, NJ: Lawrence Erlbaum Associates.
Van den Noortgate, W. & Onghena, P. (2005). Meta-analysis. In Everitt, B. S. and Howell, D. (Eds.),
Encyclopedia of statistics in behavioral science (pp. 1206a€"1217). Chichester. England: John Wiley &
Sons.
Wang, M . C. and Bushman, B. J. (1998). Integrating results through meta-analytic review using SAS
software. Cary, NC: SAS Institute Inc.
Wolf, F. M . (1986). Meta-analysis: Quantitative methods for research synthesis. Newbury Park, CA: Sage
Publications.
Notes

1A There may be situations where a researcher may need to compute an overall mean for a set of k sample means,
yet believes that some o f the mean values should be accorded more importance than others (possibly because
he believes they are more representative o f the overall population mean, or perhaps because he has reason to
believe that some o f the means are less likely to have been influenced by potentially contaminating variables).
In the latter situation a researcher may be o f the opinion that the weight assigned to a specific mean value
should not be proportional to the sample size employed to compute it. Under the latter circumstances Equation
I.50 is an alternative equation which can be employed to compute the weighted mean. In the latter equation
each mean value is assigned a weight (represented by the notation wg) reflecting its contribution in computing
the overall mean. The sum of all the weights must equal 1 (i.e. I£ w = w + w + a€] + w = 1). Obviously, in
g 1 2 k

such a situation the assignment o f weights to the mean values is subjective and could be subject to challenge.
3 A percentage is converted into a proportion by moving the decimal point two places to the left (e.g. 87% is
equivalent to .87). Conversely, a proportion is converted into a percentage by moving the decimal point two
places to the right (e.g., .87 is equivalent to 87%).
4 Strictly speaking, sEoe is not an unbiased estimate of I / , although it is usually employed as such. In point of fact,
sEoe slightly underestimates I / , especially when the value of n is small. (The latter type of statistic is said to be
negatively biased.) Zar (1999) notes that although corrections for bias in estimating I / have been developed
by Gurland and Tripathi (1971) and Tolman (1971), they are rarely employed, since they generally have no
practical impact on the outcome of an analysis. Kline (2004, p. 26) notes that Equation I.55 (which approaches
the value of sEo as the value of n increases) can provide a numerical approximation o f an unbiased estimate o f
A
I / (designated as s ).
5 The inequality sign > means greater than. Some other inequality signs used throughout the book are <, which
means less than; a%o¥, which means greater than or equal to; and a%oa, which means less than or equal to.
6 a) Since both the mean and standard deviation reflect how scores are dispersed about the mean, one might ask
why not use the average deviation about the mean as a measure of variability a€" i.e., compute the value
I£(Xa"'XA )/n? The answer is that since the sum of the positive deviations about the mean w i l l always equal
the sum o f the negative deviations about the mean, the value o f the average deviation w i l l always equal zero.
The latter is demonstrated in Column 4 of Table I.2; b) It could also be asked why not employ the average of
the absolute values o f the deviation scores about the mean as a measure o f variability? (The absolute value o f
a number is the magnitude of the number irrespective of the sign. The notation |x| is commonly employed to
represent the absolute value of a number designated as x. For example, A % a"'11 = 1.) The average of the
absolute values of the deviation scores about the mean is referred to as the mean absolute deviation (MAD)
and can be computed with Equation I.56. For the data in Table I.2, M A D computes to (4.4 + 2.4 + 1.4 + 2.6 +
5.6) / 5 = 16.4/5 = 3.28. Among others, Kachigan (1986) notes that a major reason why the mean absolute
deviation is not typically employed as a measure o f variability is because it cannot be used for certain
mathematical operations that are necessary in the development o f more advanced statistical procedures.
7 A full discussion of the concept of probability is presented in Section IX (the Addendum) o f the binomial sign
test for a single sample under the discussion o f Bayesian hypothesis testing.
8 a) Wuensch (2005, p. 1855; 2011) notes that in 1895 Pearson initially suggested measuring skewness by
standardizing the difference between the mean and the mode through use o f the following equation: skewness =
(F/4-mode b) McElroy (1979) describes the use of the equation skewness=(XA a"'M)/sEoe as an
alternative approximate measure of skewness; c) McElroy (1979) and Zar (1999) describe the following
measure o f skewness, referred to as the Bowley coefficient of skewness (Bowley (1920)), which employs the
t h
four quartiles of the distribution (where Q represents the i quartile): Skewness = (Q + Q a€" 2Q )/(Q a€"
i 3 1 2 3

Q ). The latter index yields values in the range a"'1 for a maximally negatively skewed distribution to +1 for a
1

maximally positively skewed distribution.


9 The author is indebted to Vladimir Britikov for clarifying the latter point in a personal communication.
10A A Although the term bar graph can be employed in reference to Figure I.7, the vertical elements
perpendicular to the X-axis are probably best described as lines rather than bars (since the term bar connotes a
width for a vertical element beyond that of a simple line (e.g., a- as opposed to |)). The use of noncontiguous
bars exhibiting width beyond that o f a simple line is most likely to be employed when a nominal/categorical
variable is represented on the X-axis. For example, a bar graph can be employed to present the frequency o f
people in each o f four ethnic groups which comprise a population. The identities o f each o f the four ethnic
groups would be recorded on equidistant points along the X-axis. Above each ethnic group a vertical bar would
be constructed to a height which reflected the frequency o f that group in the population. Thus, a researcher has
the option o f employing vertical bars, as opposed to vertical lines, when the information recorded along the X-
axis represents a discrete and/or categorical variable.
11A A a) It should be noted that sources are not consistent with respect to the protocol employed for computing
percentile values. In most cases any differences obtained between the various methods which might be
employed to identify a score at a specific percentile will be o f little or no practical consequence. The
methodology employed in the main text o f this book for computing percentiles is employed, among others, by
Rosner (2000). I t should be noted that in the case of Distribution A, i f the underlying variable upon which the
distribution is assumed to be based is discrete, with scores only allowed to assume an integer value, some
th th
sources might elect to employ the scores in the 1 1 and 19 ordinal positions, respectively, as the scores at the
th th th th
50 and 90 percentiles. The use of the scores in the 1 1 and 19 ordinal positions (i.e., 71 and 82) would be
th th
based on the fact that they would represent the scores closest to but above the 50 and 90 percentiles (as
computed in the main part o f the text). On the other hand, some sources might simply designate the score at the
th th
k = (20)(.5) = 10 ordinal position as the median (i.e., the score of 67), and the score at the k = (20)(.9) = 18
th
ordinal position as the score at the 90 percentile (i.e., the score of 76) (since 10/20 = .5 and 18/20 = .9); b) A n
alternative more complicated methodology for computing percentiles is described in other sources. Although
this methodology can be employed with data that are not in the form o f a grouped frequency distribution, it is
most commonly employed with the latter type o f distributions. It should be noted that when the methodology to
be described below is employed with ungrouped data (which is the case in the computation o f the values
employed in constructing the boxplot in Figure 1.9) it will yield slightly different values than those obtained
for the Figure I.9 boxplot. The alternative method for computing a score that corresponds to a specific
percentile value for a distribution is summarized by Equation I.57 below.
12 Alternative criteria for classifying an outlier are discussed in Section V I I of the t test for two independent
samples under the discussion o f outliers. I n addition, protocols for dealing with outliers are also discussed.
13 I f the variable being measured is a discrete variable which can only assume an integer value, the values
computed in this section would be expressed as integer values. Values above the median are rounded off to the
nearest integer value above the last integer digit. I n other words, the value 107.5 (or, for that matter, the value
107 with any decimal addition) for the upper outer fence is rounded off to 108. Values below the median are
rounded off to the integer value designated in the last integer digit. I n other words, i f the value 27.5 (or, for that
matter, the value 27 with any decimal addition) were derived for the lower outer fence, it is rounded off to 27.
14 Not all sources employ the one and one-half hingespread and three hingespread criteria in, respectively,
labeling outliers and severe outliers. For example, Hogg and Tanis (1997, p. 28) consider any score more than
one and one-half hingespreads from a hinge as a suspected outlier, and any score more than three
hingespreads from a hinge an outlier. Other sources (e.g., Cohen (2001, p. 78) and Howell (1997, p. 56))
define an outlier as any score that falls beyond an adjacent value.
15 The symbol I€ in Equations I.36 and I.37 represents the mathematical constant pi (which equals 3.14159a€|).
The numerical value of I€ represents the ratio of the circumference of a circle to its diameter. The value e in
Equations I.36 and I.37 equals 2.71828a€] . Like l€, e is a fundamental mathematical constant. Specifically, e is
the base of the natural system of logarithms, which will be clarified shortly. Both I€ and e represent what are
referred to as irrational numbers. A n irrational number has a decimal notation that goes on forever without
a repeating pattern of digits. In contrast, a rational number (derived from the word ratio) is either an integer
or a fraction (which is the ratio between whole/integer numbers), which when expressed as a decimal always
terminates at some point or assumes a repetitive pattern. Examples of rational numbers are 1/4 = .25 which has
a terminating decimal, or 1/3 = .33333a€], which is characterized by an endless repeating pattern of digits
(Hoffman, 1998).
16 Another type o f standardized score that is sometimes encountered in educational assessment is a stanine score
(the earliest use o f stanine scores was by the U. S. Air Force in 1943). The latter type o f scores are transformed
on a nine point scale. The original scores are generally assumed to be normally distributed or normalized
through use o f a data transformation (which is discussed in Section V I I o f the t test for two independent
samples). The transformation to stanine scores results in a distribution with a mean o f 5 and a standard
deviation of 1.96. Through use of the latter values tables can be constructed identifying percentile ranks
associated with a given stanine score (e.g., 1 = 4%; 2 = 11%; 3 = 23%; 4 = 40%; 5 = 60 %; 6 = 77%; 7= 89%;
8 = 96%; 9 = 100%) (Clark-Carter, 2005).
17 Previously, the term tail was defined as the lower or upper extremes o f a distribution. Although the latter
definition is correct, I am taking some liberty here by employing the term tail in this context to refer more
generally to the left or right half o f the distribution.
18 Although the values in Column 4 of Table A1 will not be employed in our example, a brief explanation of
what they represent follows. In the case o f the standard normal distribution, when a value o f X is substituted in
Equations I.36 or I.37, the value of X w i l l correspond to a z score. When a z value is employed to represent X,
the value of Y computed with Equation I.36/1.37 will correspond to the value recorded for the ordinate in
Column 4 of Table A1. The value of the ordinate represents the height of the normal curve for that z value. To
illustrate, i f the value z = 0 is employed to represent X, Equation I.36/I.37 reduces to Y=1/2I€, which equals
Y=1/(2)(3.1416)=.3989. The resulting value .3989 is the value recorded in Column 4 of Table A1 for the
ordinate that corresponds to the z score z = 0.
19 Kline (2004, pp. 36a€"37) notes that the null hypothesis is a point hypothesis in that it specifies one
numerical value for a population parameter. He makes a distinction between a null hypothesis which
represents a nil hypothesis and a null hypothesis which represents a non-nil hypothesis.
20 In actuality, the values o f the sample means do not have to be identical to support the null hypothesis. Due to
sampling error (which, as noted earlier in this chapter, is a discrepancy between the value o f a statistic and the
parameter it estimates, even when two samples come from the same population) the value o f the two sample
means w i l l usually not be identical. The larger the sample size employed in a study, the less the influence o f
sampling error, and consequently the closer one can expect two sample means to be to one another if, in fact,
they do represent the same population. With small sample sizes, however, a large difference between sample
means is not unusual, even when the samples come from the same population, and because o f this a large
difference may not be grounds for rejecting the null hypothesis.
21 a) Some sources employ the notationp a%oa .05, indicating a probability o f equal to or less than .05. The latter
notation will not be used unless the computed value o f a test statistic is the exact value o f the tabled critical
value; b) Most sources employ the term critical value to identify an extreme value o f a test statistic which is
extremely unlikely to occur i f the null hypothesis is true. Anderson (2001), however, suggests that it would be
better to refer to such values as criterial values, since in the final analysis they are little more than arbitrary
values which most members o f the scientific community have agreed to employ as criteria within the
framework o f hypothesis testing. However, in the final analysis, such values are not critical in any objective
sense, since other researchers may elect to employ alternative criteria which they consider more suitable for
reaching a correct decision in evaluating a hypothesis.
22 Inspection of Column 3 in Table A1 reveals that the proportion for z = 1.64 is .0505. This latter value is the
same distance from the proportion .05 as the value .0495 derived for z = 1.65. I f Table A1 documented
proportions to five decimal places, it would turn out that z = 1.65 yields a value that is slightly closer to .05
than does z = 1.64. Some books, however, do employ z = 1.64 as the tabled critical one-tailed .05 z value.
23 The probability value obtained within the framework o f the classical hypothesis testing model can be
interpreted as a conditional probability. A conditional probability (which is discussed in more detail later in
this chapter in the section on basic principles of probability) is a probability which is contingent upon certain
conditions having been met. More specifically, when a null hypothesis is evaluated within the framework o f
the classical hypothesis testing model, the probability value associated with the result o f an experiment
represents the following conditional probability: p (Outcome of experiment /H true), which represents the
0

probability o f obtaining the outcome for an experiment, given the fact that the null hypothesis is true.
Unfortunately, researchers often erroneously interpret the probability value associated with the result o f an
experiment to represent the conditional probability p ( H true / Outcome of experiment), which represents
0

the probability the null hypothesis is true, given the result obtained for an experiment.
24 The author is in agreement with the following statement by Wainer (1999, p. 212): a€oeTo be perfectly honest,
I am a little at loss to fully understand the vehemence and vindictiveness that have recently greeted N H T (null
hypothesis significance testing). These criticisms seem to focus primarily on the misuse o f NHT. The focus on
the technique rather than on those who misuse it seems to be misplaced.a€'
25 Some sources such as Christensen (1997, pp. 477 and 483; 2000) make a distinction between the terms
external validity and ecological validity. The latter author defines external validity as the extent to which the
results o f an experiment can be generalized across different persons, settings, and times. Ecological validity is
defined as the extent to which the results o f an experiment can be generalized across settings or environmental
conditions a€" most specifically, the degree to which the results of a laboratory experiment can be generalized
to a real world setting.
26 a) Random assignment is a major prerequisite for insuring the internal validity o f a study. In contrast, the use
o f a random sample is intended to provide a researcher with a representative cross section of a population, and
by virtue of the latter allows the researcher to generalize onea€™s results to the whole population. Although, in
practice, random samples are rarely employed in experiments, random assignment is a prerequisite of a true
experimental design; b) The true experiment described earlier in this chapter represents the simplest type o f
true experimental design.
27 a) Among others, Cowles (1989), Larsen and Marx (1985) and Zar (1999) note that although addressed to some
extent earlier, the formal study o f the concept o f probability can be traced back to two French mathematicians,
Blaise Pascal (1623a€"1662) and Pierre de Fermat (1601a€"1665); b) A more detailed discussion of the rules
to be presented in this section can be found in most books that specialize in the subject o f probability. The
format o f the discussion to follow is based, in part, on an excellent presentation o f the topic in Hays and
Winkler (1971, Ch. 2).
28 a) I f A and B are mutually exclusive, P(A I B) = 0; b) In the case o f three events A , B, or C which are not
mutually exclusive, the probability that an event is A or B or C is computed with the equation: P(A or B or C)
= P(A) + P(B) + P(C) a€" P(A and B) a€" P(A and C) a€" P(B and C) + P(A and B and C).
29 Venn diagrams are named after the British logician James Venn (1834a€"1923). Hays and Winkler (1971, p. 8)
note that such diagrams are also referred to as Euler diagrams, after the Swiss mathematician Lennard Euler
(1707a€"1783).
30 Larsen and Marx (1985) note that although in 1812 the French mathematician Pierre Simon, the Marquis de
Laplace (1749a€"1827) made the first explicit statement of Bayesa€™ theorem, the theorem was initially
presented in 1763 by Bayes in a posthumously published paper. Cowles (1989, p. 73), however, notes there is a
lack of agreement among historians with regard to whether the ideas in Bayesa€™ paper were entirely his own
work, or also contain contributions by his friend Richard Price, who submitted the paper for publication two
years after Bayesa€™ death.
31 The accuracy o f parametric tests is most likely to be compromised when the assumption o f normality o f the
underlying distribution(s) being evaluated is violated. Among others, Wilcox (2003) contends that researchers
should employ computer-intensive procedures, which are discussed in Section I X (the Addendum) o f the
Mann-Whitney U test (Test 12), as an alternative to many o f the procedures described in this book when
there is reason to believe that one or more o f the assumptions underlying a parametric or nonparametric test
have been saliently violated.
32 Since interval and ratio data are viewed the same within the decision making process with respect to test
selection, the expression interval/ratio w i l l be used throughout the book to indicate that either type o f data is
appropriate for use with a specific test.
33 Use of computer-intensive procedures referred to in Endnote 31 can often help clarify inconsistent results
between parametric versus nonparametric tests employed with the same data when there is reason to believe
that one or more assumptions underlying one or more o f the tests in question have been saliently violated.
34 One area o f statistical analysis which w i l l not be covered in this book is circular statistics. The term circular
statistics (referred to as angular statistics or radial statistics in some sources) is commonly employed for
methods (as well as the associated theory) that are relevant in the analysis of circular data a€" i.e., data based
on a circular scale of measurement. In the discussion o f levels o f measurement in this chapter, it is noted that
an interval scale of measurement does not have a true zero point. A circular scale of measurement is a special
case o f an interval scale, which not only lacks a true zero point, but is also arbitrary with respect to the
magnitude of the values which constitute the scale (i.e., the designation of what constitutes high versus low
values on such a scale is arbitrary). Another characteristic o f a circular scale is that the maximum and
minimum values on the scale intersect/coincide. A common example o f a circular scale is the measurement o f
direction with a compass, which is represented in Figure I.28. The latter scale divides a circle into 360 equal
intervals which are referred to as degrees (which is represented by the notation A°). The designation of 0A°
(which is equivalent to 360A°) as north and 180A° as south is arbitrary. Note that a direction of 10A° is much
closer to 350A° than it is to 40 A°, and thus, in the final analysis in an absolute sense it is meaningless to say
that 10A° is less than (or for that matter greater than) 350A°. Zar (1999, p. 592) notes that although circular
data are typically measured in angles, such data can also be measured in radians (which are discussed in
Section V I I o f the t test for two independent samples under the discussion o f outliers and data
transformation). Zar (1999, p. 594) also notes that on occasion angular data (i.e., data expressed in an angular
format) may be arranged on other than a circular cycle a€" for example, data could be arranged on a semi-
circular cycle consisting of 180 A°.
1 The exact probability value recorded for z = 1.67 in Column 3 of Table A1 is .0475 (which is equivalent to
4.75%). This indicates that the proportion of cases which falls above the value z = 1.67 is .0475, and the
proportion of cases which falls below the value z = a"'1.67 is .0475. Since this indicates that in the left tail of
the distribution there is less than a 5% chance of obtaining a z value equal to or less than z = a"'1.67, we can
reject the null hypothesis at the .05 level i f we employ the nondirectional alternative hypothesis H\. I / < 8,
with I± = .05.
2 Equation 1.2 is employed to compute the standard error o f the population mean when the size o f the underlying
population is infinite. I n practice, it is employed when the size o f the underlying population is large and the
size of the sample is believed to constitute less than 5% of the population. However, among others, Freund
(1984) notes that in a finite population, i f the size of a sample constitutes more than 5% of the population, a
correction factor is introduced into Equation 1.2. The computation o f the standard error o f the mean with the
finite population correction factor is noted below:
3 Inspection o f Table A1 reveals the exact percentage o f cases in a normal distribution which falls within three
standard deviations above or below the mean is 99.74%.
1 van Belle (2002, pp. 26a€"28) notes that for small samples (i.e., n a%oa 15) Equation 2.27 w i l l generally provide
a good estimate o f the standard error of the mean. In the case o f Example 2.1, since Range = 15 a€" 0 = 15,
the latter equation yields the value sXA =1.5.
2 I n order to be solvable, Equation 2.3 requires that there be variability in the sample. I f all of the subjects in a
sample have the same score, the computed value of sEoe will equal zero. When sEoe=0, the value of sXA w i l l
always equal zero. When sXA =0, Equation 2.3 becomes unsolvable, thus making it impossible to compute a
value for t. I t is also the case that when the sample size is n = 1, Equation 2.1 becomes unsolvable, thus making
it impossible to employ Equation 2.3 to solve for t.
3 I n the event that I / is known (and the researcher is confident that the latter value is, in fact, the standard
deviation o f the population in question) and n < 25, and the researcher elects to employ the single-sample t
test, the value of I / should be used in computing the test statistic. Given the fact that the value of I / is known,
it would be foolish to employ sEoe as an estimate of it.
4 The t distribution was derived by William Gosset (1876a€"1937), a British statistician who published under the
pseudonym o f Student.
5 I t is worth noting that i f the value of the population standard deviation in Example 2.1 is known to be I / = 4.25,
the data can be evaluated with the single-sample z test. When employed it yields the value z = 1.94, which is
identical to the value obtained with Equation 2.3. Specifically, since
I/a€%o=a€%osEre=4.25,a€%oI/XA~=sXA~=4.25/10=1.34. Employing Equation 1.3 yields z = (7.6 a€" 5)/l .34
= 1.94. As is the case for the single-sample t test, the latter value only supports the directional alternative
hypothesis H : u > 5 at the .05 level. This is the case since z = 1.94 is greater than the tabled critical one-tailed
1

value z = 1.65 in Table A1. The value z = 1.94, which is less than the tabled critical two-tailed value z =
0 5 0 5

1.96, falls just short of supporting the nondirectional alternative hypothesis H : I % a%o 5 at the .05 level.
1

6 A sampling distribution o f means for the t distribution when employed in the context o f the single-sample t test
is interpreted in the same manner as the sampling distribution o f means for the single-sample z test as depicted
in Figure 1.1.
7 In the event the researcher is evaluating the power of the test in reference to a value of I / that is less than I / =
1

5, Distribution B will overlap the left tail o f Distribution A .


8 I t is really not possible to determine this value with great accuracy by interpolating the entries in Table A2.
9 I f in this example the table for the normal distribution is used to estimate the power o f the single-sample t test,
it can be determined that the proportion of cases which falls above a z value of 1.51 is .0655. Although the
value .0655 is close to .085, it slightly underestimates the power of the test.
10 a) In the third edition of this book the notation I ' (which is the lower case Greek letter delta) was employed to
represent the noncentrality parameter; b) In Test 16 (the chi-square test for r A — c tables) the alternative
symbol !• is employed to represent the lower case Greek letter phi, which is used in the latter chapter to
represent a nonparametric measure o f association; c) Throughout this book a number o f different theoretical
2
probability distributions (e.g., the t distribution, the chi-square (If ) distribution, and the F distribution) are
employed for conducting inferential statistical tests, as well as in computing additional information such as the
power o f a test and confidence intervals for a specific parameter. The t distribution (for which Table A2
displays probability values and corresponding t values) employed in this chapter and throughout this book is
sometimes referred to as the central t distribution. The latter distribution is based on the assumption that a
null hypothesis is correct. There is, however, a noncentral t distribution which is not based on the latter
assumption. Among others, Kline (2004, p. 35) and Smithson (2003) note that the noncentral t distribution is
actually a family o f distributions ( o f which the central t distribution represents a special case), and that the
shape o f a specific noncentral t distribution w i l l be a function o f the value o f an additional parameter called
the noncentrality parameter. The latter parameter (which in the case o f the central t distribution equals
zero) essentially reflects the degree o f deviation from the null hypothesis. Kline (2004, p. 35) notes that as the
value of the noncentrality parameter for the noncentral t distribution becomes increasingly positive, the shape
o f the t distribution becomes increasingly nonsymmetrical and positively skewed. Although noncentral
distributions (such as the noncentral t distribution) can be very valuable for computing such things as power
and confidence intervals, they are difficult or impractical to use without the aid o f a computer or specialized
tables (which are generally not available). However, with the introduction o f high speed computers,
statisticians are increasingly taking advantage o f the benefits noncentral distributions can provide in analyzing
data; d) The value of I f can be computed directly through use of the following equation:
Ifa€%o=a€%o(I%1a"'F/4)/(I/Vn). Note that the equation expresses effect size in standard deviation units of the
sampling distribution; e) Further discussion of Cohena€™s d index (as well as the sample analogue referred to
as the g index) can be found in Endnote 15 of the t test for two independent samples and in Section I of Test
43 on meta-analysis. The g index for the result of the single-sample t test can be computed when, in Equation
2.5, X A 1 is employed in lieu of I / and sEoe is employed in lieu of I / .
1

11 van Belle (2002, pp. 31a€"33) notes that Equation 2.28 can be employed to estimate the sample size required in
order to conduct a single-sample t test i f a nondirectional analysis is conducted with I± = .05. The value of k in
the numerator of the latter equation will depend on the desired power of the test, and for the power values .50,
.80, .90, .95, and .975 the values to employ for k are, respectively, 4, 8, 11, 13, and 16. Equation 2.28 will be
demonstrated in reference to computing the necessary sample size for the power o f the test to equal .80, i f as
previously the researcher employs the null hypothesis H : I / = 5 versus the alternative hypothesis H : I / = 6.
0 1 1

I f the power of the test is to equal .80, the value k = 8 is employed in the numerator of Equation 2.28. Equation
2.5 is employed to compute the value d = .235 (which was computed earlier by dividing the one point
difference stipulated by the null versus alternative hypotheses by the value sEo=.235, which represents the
best estimate of I / ) . When the appropriate values are substituted in Equation 2.28, the sample size estimate for
the power of the test to equal .80 is n = 144.86 which rounded off equals 145.
12 a) A perusal o f statistics texts published over the years will reveal that some texts (including the first two
editions o f this book) would employ the following statement with respect to the 95% confidence interval for
Example 2.1: The physician can be 95% confident that the actual value for the average number of visits per
patient (i.e., the population mean) falls within the range of values 4.57 to 10.63. Technically, the latter
statement is incorrect. The subtle difference in language between the latter statement and the statement, the
physician can be 95% confident that the range of values 4.57 to 10.63 includes the actual value for the average
number of visits per patient can be understood i f one considers the difference between how probabilities are
established within the framework o f the classical hypothesis testing model versus the Bayesian hypothesis
test model. The latter distinction is clarified in Section I X (the Addendum) of the binomial sign test for a
single sample under the discussion o f Bayesian statistics. I n that discussion it notes that in the classical
hypothesis testing model an unknown population parameter is viewed as having a fixed value, whereas in
Bayesian hypothesis testing an unknown population parameter is viewed as a random variable. I n Bayesian
statistics an interval can be computed which is analogous to the confidence interval discussed in this chapter,
and in the case o f Example 2.1 the Bayesian interval, which is sometimes referred to as a credible interval,
would allow a researcher to state there is a 95% probability the population parameter falls within that interval.
The difference in the language employed in defining a confidence interval versus a credible interval can be
explained on the basis o f whether the unknown population parameter is conceptualized as having a fixed value
or being a random variable. Further discussion o f the language employed in defining a confidence interval can
be found in Howell (2002, p. 208) and Kline (2004, pp. 29a€"30); b) I f a researcher has employed a
directional alternative hypothesis, some statistics texts compute a one-tailed confidence interval. To
illustrate (assuming the two-tailed critical value t = 2.26 is still employed in Equation 2.9), i f H : I / > 5, the
0 5 1

notation 4.57 a%oa I / a%oa + a"z is commonly used to represent the one-tailed lower 97.5% confidence
interval, since the latter interval only stipulates the boundary demarcating the extreme 2.5% o f values in the
lower/left tail of the distribution. I f H : I / < 5, the notation a"'a"z a%oa I / a%oa 10.63 is commonly used to
1

represent the one-tailed upper 97.5% confidence interval, since the latter interval only stipulates the
boundary demarcating the extreme 2.5% of values in the upper/right tail of the distribution. The use of A±a"z
in expressing the aforementioned confidence intervals reflects the fact that the tail in the distribution relevant to
the alternative hypothesis is viewed as unbounded.
13 As noted in Section V , the only exception to this will be when the sample size is extremely large, in which case
the normal and t distributions are identical. Under such conditions the appropriate values for z and t employed
in the confidence interval equation w i l l be identical.
14 The reader should take note of the fact that although most computed confidence intervals are symmetrical (as is
the case in this chapter, where the identical value is added to and subtracted from the sample mean), among
others, Kline (2004, p. 27) and Smithson (2003, pp. 5a€"9) note that a confidence interval is not always
symmetrical. Examples o f non-symmetrical confidence intervals occur within the framework o f employing
noncentral distributions, computer-intensive procedures such as the bootstrap (discussed in Section I X (the
Addendum) of the Mann-Whitney U test (Test 12)), and using Fishera€™s z transformation in computing a
confidence interval for a correlation coefficient (which is discussed in Section V I o f the Pearson product-
moment correlation coefficient (Test 28)).
15 a) With respect to the fact that a larger proportion o f cases falls between the mean o f the normal distribution
and a given standard deviation score than the proportion o f cases which falls between the mean and that same
standard deviation score in the t distribution, Howell (2002, p. 184) notes the following: The latter can be
explained by the fact that the sampling distribution o f the estimated population variance (i.e., sEo2) is
positively skewed (especially for small sample sizes). Because o f the latter it is more likely that a value
2
computed for sEo2 will underestimate the true value of the population variance (i.e., I / ) than overestimate it.
The end result of the sampling distribution o f sEo2 being positively skewed is that the t value computed for a
set o f data will be larger than the z value which would be computed for the same set of data if, in fact, the value
2
of I / was known and equal to the computed value o f sEoe2; b) A t distribution is leptokurtic, and the degree
of leptokurtosis decreases as the sample size increases. Since when n = a"z the t distribution is identical to the
normal distribution, in the latter instance the t distribution will be mesokurtic.
16 The six standard deviation range is the basis for the name of the well known business management strategy Six
Sigma.
17 Chou (1989, p. 542) notes that the larger the size o f n for a sample/subgroup, the narrower will be the control
limits (i.e., the closer together w i l l be the values computed for the U C L and L C L ) . Larger sample/subgroup
values (e.g., n = 10 or n = 20) are employed when one wishes to have a control chart that is highly sensitive to
small variations in production numbers.
18 Equation 2.14 is employed in lieu o f Equation 2.2 since the actual value o f population standard deviation is
known. Thus I / (the population standard deviation) is employed in lieu o f sEo (the unbiased estimate of the
population standard deviation used in Equation 2.2). The notation I / X A in Equation 2.14 represents the actual
value of the standard error of the population mean (whereas the notation I / X A in Equation 2.2 represents the
best estimate of I / X A ).
19 Chou (1989, p. 542) notes that when n a%o¥ = 10, the standard deviation is preferable to use as opposed to the
range in a control chart.
20 The multiple columns in Table A28 are employed for different types o f control charts. Only some o f them w i l l
be discussed in this chapter.
21 Most sources employ the notation I / in lieu of s. The author employs s, however, to represent the fact that it is
computed from sample data and the use o f the notation I / is reserved for the actual population standard
deviation.
22 This list is based on Benneyan (1998, p. 70). Readers should be aware o f the fact that criteria listed in different
sources may not be totally consistent with one another.
23 More detailed control charts are broken down into the following zones: Zone A contains the area between
horizontal lines that mark off points that are 2 and 3 standard deviations above the center line (i.e., between +2
sigma and +3 sigma). There is also a Zone A below the center line which includes the area between horizontal
lines that mark off points that are 2 and 3 standard deviations below the center line (i.e., between a"'2 sigma
and a"'3 sigma). Zone B contains the area between horizontal lines that mark off points that are 1 and 2
standard deviations above the center line (i.e., between+1 sigma and +2 sigma). There is also a Zone B below
the center line which includes the area between horizontal lines that mark off points that are 1 and 2 standard
deviations below the center line (i.e., between a"'1 sigma and a"'2 sigma). Zone C contains the area between
the center line and a horizontal line that marks o f f the point 1 standard deviation above the center line (i.e.,
between 0 sigma and+1 sigma). There is also a Zone C below the center line which contains the area between
the center line and a horizontal line that marks o f f the point 1 standard deviation below the center line (i.e.,
between 0 sigma and a"'1 sigma).
24 Although they w i l l not be discussed it is also possible to construct control charts based on other types o f
distributions such as the Poisson and geometric distributions. The Poisson distribution, which is sometimes
referred to as the distribution of rare events (and discussed in detail in the Addendum o f the chapter on the
binomial sign test for a single sample) is most commonly employed in evaluating a distribution o f random
events that have a low probability o f occurrence.
25 Note that unlike the control chart depicted in Figure 2.12, the dots in the control chart depicted in Figure 2.13
have not been connected by lines. The choice o f connecting the dots i n a control chart is optional and both
formats are employed.
26 I f at some time interval a trend occurs where data points consistently fall above the center line, this can be
indicative o f a problem in the production process. Yet the latter can also be indicative o f more careful
inspection o f a product, and consequently rejection o f more items.
27 The sampling distribution of the average number of defects per item (cA ) is best approximated by the Poisson
distribution, which is discussed in detail in the Addendum o f the chapter on the binomial sign test for a
single sample. The Poisson distribution is most commonly employed in evaluating a distribution o f random
events that have a low probability o f occurrence, which is applicable to the number of defects per item. The
2
mean of a Poisson distribution is equal to its variance (i.e., I / = I / ) , and thus the standard deviation is equal to
the square root of the mean (i.e., I/a696o=a696oF/). Consequently the equation for the A ± 3 sigma control limits
becomes cA A±3cA .
1 Most sources note that violation o f the normality assumption is much more serious for the single-sample chi-
square test for a population variance than it is for tests concerning the mean of a single sample (i.e., the
single-sample z test and the single-sample t test). Especially in the case o f small sample sizes, violation o f the
normality assumption can severely compromise the accuracy o f the tabled values employed to evaluate the chi-
square statistic.
2 One can also state the null and alternative hypotheses in reference to the population standard deviation (which is
the square root of the population variance). Since in Example 3.1 I/a€99o=a€99oI/2=5=2.24, one can state the
null hypothesis and nondirectional and directional alternative hypotheses as follows: H0: I / = 2.24; H1: I / a9
2.24; H : I / > 2.24; and H : I / < 2.24.
1 1

3 The use of the chi-square distribution in evaluating the variance is based on the fact that for any value of n, the
2
sampling distribution of Aj has a direct linear relationship to the chi-square distribution for df = n a€" 1. As is
the case for the chi-square distribution, the sampling distribution of sEo2 is positively skewed. Although the
2
average of the sampling distribution for sEo2 w i l l equal I / , because o f the positive skew o f the distribution, a
2
value of sEo2 is more likely to underestimate rather than overestimate the value of I / . For further discussion
of this latter the reader is referred to Howell (2002, pp. 184a€"186).
4 When the chi-square distribution is employed within the framework o f the single-sample chi-square test for a
population variance, it is common practice to employ critical values derived from both tails o f the
distribution. However, when the chi-square distribution is used with other statistical tests, as a general rule only
critical values in the upper/right tail o f the distribution are employed. Examples o f chi-square tests which focus
on the upper/right tail of the distribution are the chi-square goodness-of-fit test (Test 8) and the chi-square
test for r A — c tables (Test 16).
5 In Section I of the single-sample t test it is noted that some sources argue when the sample size is very small
(generally less than 25) the latter test should be employed in evaluating a null hypothesis about a population
mean, even i f one knows the value of I / .
6 Although the procedure described in this section for computing a confidence interval for a population variance is
the one that is most commonly described in statistics books, it does not result in the shortest possible
confidence interval which can be computed. Hogg and Tanis (1988) describe a method (based on Crisman
(1975)) requiring more advanced mathematical procedures that allows one to compute the shortest possible
confidence interval for a population variance. For large sample sizes the difference between the latter method
and the method described in this section w i l l be trivial.
7 When data are available from multiple studies, a useful method for evaluating a confidence interval for a
population mean is described at the end of the discussion of confidence intervals in Section V I of the single-
sample t test. This latter method can also be employed in evaluating a confidence interval for a population
variance or standard deviation.
1 The reader should take note o f the fact that the test for evaluating population skewness described in this chapter
is a large sample approximation. In point o f fact, sources are not in agreement with respect to what equation
provides for the best test o f the hypothesis o f whether or not g (which represents the value for skewness
1

printed out in most computer packages (e.g., SPSS)) and/or b1 deviates significantly from 0. The general
format of alternative equations for evaluating skewness which are employed in other sources (e.g., statistical
software packages such as SPSS, SAS, S-Plus) involves the computation of a z value by dividing the skewness
coefficient by the estimated population standard error (Equation 4.9). That a different z value can result from
use of one or more of these alternative equations derives from the fact that sources are not in agreement with
respect to what statistic provides the best estimate of the population standard error (SB).
2 Analysis of the data with SPSS yields almost identical results. Specifically: a) For Distribution E , SPSS
computes the values skewness = g = 0 and SE = .687. When the latter values are substituted in Equation 4.9,
1

the value z = 0/.687 = 0 is obtained; b) For Distribution F , SPSS computes the values skewness = g = a"'1.02
1

and SE = .687. When the latter values are substituted in Equation 4.9, the value z = a"'1.02/.687 = a"'1.48 is
obtained; and c) For Distribution G, SPSS computes the values skewness = g = 1.02 and SE = .687. When the
1

latter values are substituted in Equation 4.9, the value z = 1.02/.687 = 1.48 is obtained.
1 The reader should take note o f the fact that the test for evaluating population kurtosis described in this chapter is
a large sample approximation. I n point o f fact, sources are not in agreement with respect to what equation
provides for the best test o f the hypothesis o f whether or not g (which represents the value for kurtosis printed
2

out in most computer packages (e.g., SPSS)) and/or b deviates significantly from the expected value for a
2
mesokurtic distribution. The general format o f alternative equations for evaluating kurtosis which are
employed in other sources (e.g., statistical software packages such as SPSS, SAS, S-Plus) involves the
computation of a z value by dividing the kurtosis coefficient (represented by g ) by the estimated population
2

standard error (Equation 5.10). That a different z value can result from use of one or more of these alternative
equations derives from the fact that sources are not in agreement with respect to what statistic provides the best
estimate o f the population standard error (SE).
2 I n the second and third editions of the book the absolute value of g was employed in Equation 5.3 to compute
2

the value of H. However, further scrutiny of the derivation of the latter equation in Da€™Agostino et al.
(1990) and Da€™Agostino and Stephens (1986) indicates that the sign of g must be taken into account. More
2

specifically, the latter sources indicate that the value o f G computed with Equation 5.2 represents the variance
o f b , and that H represents a standardized value o f b , which can assume either a positive or negative value. I f
2 2

the absolute value of g is employed in Equation 5.3, the value of z computed with Equation 5.7 can only be
2

positive, which the above noted sources demonstrate w i l l not always be the case.
3 Analysis o f the data with SPSS yields the following results. Specifically: a) For Distribution H , SPSS computes
the values kurtosis = g = 3.58 and S E = .992. When the latter values are substituted in Equation 5.10, the
2

value z = 3.58/.992 = 3.61 is obtained (which is substantially greater than the z value obtained with Equation
5.7). Since z = 3.61 is greater than the tabled critical two-tailed value z = 2.58 and the tabled critical one-
.01

tailed value z = 2.33, the null hypothesis can be rejected i f the nondirectional alternative hypothesis or the
.01

directional alternative hypothesis stipulating leptokurtosis is employed a€" in other words, the researcher can
conclude there is a high likelihood the sample is derived from a distribution that is leptokurtic; b) For
Distribution I, SPSS computes the values kurtosis = g = a"'.939 and S E = .992. When the latter values are
2

substituted in Equation 5.10, the value z = a€" .939/.992 = a"'.95 is obtained. Since the absolute value z = .95
is less than the tabled critical two-tailed value z = 1.96 and the tabled critical one-tailed value z = 1.65, the
0 5 0 5

null hypothesis cannot be rejected regardless of which alternative hypothesis is employed a€" in other words,
there is insufficient evidence to indicate that the sample is derived from a distribution that is not mesokurtic.
4 When Table A4 is employed to evaluate a chi-square value computed for the Da€ Agostinoa€"Pearson test
TM

of normality, the following protocol is employed. The tabled critical values for df = 2 are derived from the
right tail of the distribution. Thus, the tabled critical .05 chi-square value (to be designated IJ.052) w i l l be the
tabled chi-square value at the 95th percentile. In the same respect, the tabled critical .01 chi-square value (to be
designated IJ.012) w i l l be the tabled chi-square value at the 99th percentile. For further clarification of
interpretation o f the critical values in Table A4, the reader should consult Section V o f the single-sample chi-
square test for a population variance (Test 3).
1 Some sources note that one assumption o f the Wilcoxon signed-ranks test is that the variable being measured is
based on a continuous distribution. In practice, however, this assumption is often not adhered to.
2 The binomial sign test for a single sample is employed with data that are in the form o f a dichotomous variable
(i.e., a variable represented by two categories). Each subjecta€™s score is assigned to one of the following two
categories: Above the value of the hypothesized population median versus Below the value of the
hypothesized population median. The test allows a researcher to compute the probability o f obtaining the
proportion o f subjects in each o f the two categories, as well as more extreme distributions with respect to the
two categories.
3 The Wilcoxon signed-ranks test can also be employed in place o f the single-sample z test where the value o f
I / is known, but when the normality assumption of the latter test is saliently violated.
4 I t is just coincidental in this example that the absolute value o f some o f the difference scores corresponds to the
value of the rank assigned to that difference score.
5 The reader should take note o f the fact that no critical values are recorded in Table A5 for very small sample
sizes. I n the event a sample size is employed for which a critical value is not listed at a given level o f
significance, the null hypothesis cannot be evaluated at that level o f significance. This is the case since with
small sample sizes the distribution o f ranks will not allow one to generate probabilities equal to or less than the
specified alpha value.
6 The term (I£t a€" l£t) in Equation 6.4 can also be written as I£i=1s(ti3a"'ti). The latter notation indicates the
3

following: a) For each set o f ties, the number o f ties in the set is subtracted from the cube o f the number o f ties
in that set; and b) the sum o f all the values computed in a) is obtained. Thus, in the example under discussion
(in which there are s = 3 sets o f ties):
7 A correction for continuity can be used in conjunction with the tie correction by subtracting .5 from the absolute
value computed for the numerator of Equation 6.4. Use of the correction for continuity will reduce the tie-
corrected absolute value o f z.
8 Sources are not in agreement with respect to the minimum sample size for which the latter equations should be
employed.
1 a) Marascuilo and McSweeney (1977) employ a modified protocol which can result in a larger absolute value for
2
M in Column F or Ma€ in Column G than the one obtained in Table 7.2. The latter protocol employs a
separate row in the table for each instance in which the same score occurs more than once in the sample data. I f
the latter protocol were employed in Table 7.2, there would be two rows in the table for the score o f 90 (which
is the only score that occurs more than once). The first 90 would be recorded in Column A in a row that has a
cumulative proportion in Column E equal to 14/30 = .4667. The second 90 would be recorded in the following
row in Column A with a cumulative proportion in Column E equal to 15/30 = .5000. In the case o f Example
7.1, the outcome o f the analysis would not be affected i f the aforementioned protocol is employed. In some
2
instances, however, it can result in a different/larger M or Ma€ value. The protocol employed by Marascuilo
and McSweeney (1977) is employed by sources who argue that when there are ties present in the data (i.e., a
score occurs more than once), the protocol described in this chapter (which is used in most sources) results in
an overly conservative test (i.e., makes it more difficult to reject a false null hypothesis); b) It is not necessary
to compute the values in Column G i f a discrete variable is being evaluated. Conover (1980, 1999) and Daniel
(1990) discuss the use of the Kolmogorova€"Smirnov goodness-of-fit test for a single sample with discrete
data. Studies cited in the latter sources indicate that when the Kolmogorova€"Smirnov test is employed with
discrete data, it yields an overly conservative result (i.e., the power o f the test is reduced).
2 A general discussion o f confidence intervals can be found in Section V I o f the single sample t test (Test 2).
3 The gamma and exponential distributions are continuous probability distributions. The exponential
distribution is discussed in detail in Section I X (the Addendum) o f the binomial sign test for a single
sample (Test 9).
4 Table A22 is only appropriate for assessing goodness-of-fit for a normal distribution. Lilliefors (1969, 1973) has
developed tables for other distributions (e.g., the exponential and gamma distributions).
1 Categories are mutually exclusive i f assignment to one o f the k categories precludes a subject/object from being
assigned to any one of the remaining (k a€" 1) categories.
2 The reason why the exact probabilities associated with the binomial and multinomial distributions are generally
not computed is because, except when the value of n is very small, an excessive amount of computation is
involved. The binomial distribution is discussed under the binomial sign test for a single sample, and the
multinomial distribution is discussed in Section IX (the Addendum) o f the latter test.
3 Example 8.6 in Section V I I I illustrates an example in which the expected frequencies are based on prior
empirical information.
4 It is possible for the value o f the numerator o f a probability ratio to be some value other than 1. For instance, i f
one is evaluating the number o f odd versus even numbers which appear on n rolls o f a die, in each trial there
are k = 2 categories. Three face values ( 1 , 3, 5) will result in an observation being categorized as an odd
number, and three face values (2, 4, 6) w i l l result in an observation being categorized as an even number. Thus,
the probability associated with each of the two categories will be 3/6 = 1/2. It is also possible for each of the
categories to have different probabilities. Thus, i f one is evaluating the relative occurrence o f the face values 1
and 2 versus the face values 3, 4, 5, and 6, the probability associated with the former category w i l l be 2/6 = 1/3
(since two outcomes fall within the category 1 and 2), while the probability associated with the latter will be
4/6 = 2/3 (since four outcomes fall within the category 3, 4, 5, and 6). Examples 8.6 and 8.7 in Section V I I I
illustrate examples where the probabilities for two or more categories are not equal to one another.
5 When decimal values are involved, there may be a minimal difference between the sums o f the expected and
observed frequencies due to rounding off error.
6 There are some instances when Equation 8.3 should be modified to compute the degrees o f freedom for the chi-
square goodness-of-fit test. The modified degrees o f freedom equation is discussed in Section V I , within the
framework o f employing the chi-square goodness-of-fit test to assess goodness-of-fit for a normal
distribution.
7 Sometimes when one or more cells in a set o f data have an expected frequency o f less than five, by combining
cells (as is done in this analysis) a researcher can reconfigure the data so that the expected frequency o f all the
resulting cells is greater than five. Although this is one way o f dealing with the violation o f the assumption
concerning the minimum acceptable value for an expected cell frequency, the null hypothesis evaluated with
the reconfigured data w i l l not be identical to the null hypothesis stipulated in Section I I I .
8 I n a one-dimensional chi-square table, subjects/objects are assigned to categories which reflect their status on a
single variable. I n a two-dimensional table, two variables are involved in the categorization o f subjects/objects.
As an example, i f each of n subjects is assigned to a category based on onea€™s gender and whether one is
married or single, a two-dimensional table can be constructed involving the following four cells: Male-
Married; Female-Married; Male-Not married; Female-Not married. Note that people assigned to a given
cell fall into one of the two categories on each of the two dimensions/variables (which are gender and marital
status). Analysis o f two-dimensional tables is discussed under the chi-square test for r A — c tables. I n
Section V I I o f the latter test, tables with more than two dimensions (commonly referred to as
multidimensional contingency tables) are also discussed.
9 a) Daniel (1990) notes that Equation 8.6 w i l l only yield reliable results when nI€ and n(1 a€" I € ) are both
1 1

greater than 5. I t is assumed that the researcher estimates the value of I € prior to collecting the data. The
1

researcher bases the latter value either on probability theory or preexisting empirical information. Generally
speaking, i f the value of n is large, the value of p should provide a reasonable approximation of the value of
1

I€ for calculating the values nI€ and n( 1 a€" I € ) ; b) Although not commonly employed, the finite
1 1 1

population correction factor stipulated in Endnote 2 o f the single-sample z test (Test 1) should be used in
computing the estimated standard error of the population proportion when the size o f the underlying
population is large and the size o f the sample is believed to constitute less than 5% o f the population. Under the
latter circumstances, the standard error (p1p2)/n ) is multiplied by (Na"'n)/(Na"'1) (which represents the
correction factor, where N represents the total number of people that comprise the population). The finite
population corrected equation w i l l result in a smaller value for the standard error o f the population proportion
employed in Equation 8.6.
10 a) Smithson (2003, p. 11) notes that in the case o f computing a confidence interval for a proportion which is
not an extreme value, for a specific confidence interval level, the width o f the confidence interval will reduce
in width by about one-half i f the sample size is quadrupled; b) Confidence intervals for population proportions
(typically, the 95% confidence interval) are commonly employed in expressing the results of surveys and/or
political polls. As an example (employing the proportions p = 29/120 = .242 and p = 91/120 = .758 obtained
1 2

for the example under discussion), a pollster might state that in an exit poll for an election (in which we will
assume the votes o f 120 people were obtained) the proportion (percentage) o f votes received by Candidate A
was .242 (24.2%) versus .758 (75.8%) for Candidate B, with a margin of error of A± .101 (which would most
likely be stated as a margin of error of A± 10.1% or A ± 10.1 percentage points). Since the latter margin of
error is based on the 99% confidence interval, the pollster might also state we can be 99% confident that for a
given candidate the sample proportion A± 10.1% contains the actual proportion of people in the population
who voted for that candidate. I t should be noted that a margin of error of A± 10.1% would be considered too
large for a political poll, and because o f the latter most polls employ a sample size substantially larger than n =
120. The use o f a larger sample size in the example under discussion would result in a smaller value for the
standard error, and consequently reduce the width of the confidence interval; b) One might intuitively expect
that the larger a population, the larger the size o f the sample required in a survey in order to achieve a specified
level o f confidence and/or precision. However, the latter is not the case as long as a population is relatively
large. To be more specific, in a large population the proportion o f the population sampled w i l l not have an
impact on the precision of an estimate or the level of confidence. I n point of fact, it is the size of the sample
(i.e., n) rather than the proportion of the population the sample represents (i.e., n/N) which determines the
precision of the estimate (i.e., the margin of error value) and degree of confidence. Equation 8.13 can be
employed to determine the sample size required in order to attain a certain level o f confidence and/or precision
in a survey. (The latter equation (also discussed in Folz (1996, p. 50) is derived from Equation 8.6, where the
width of a confidence interval (represented by w in Equation 8.13) is computed with the term [ (p1p2)/n
](zI±/2).)
11 Glass and Hopkins (1996, pp. 325a€"327) recommend the use of a method proposed by Ghosh (1979) for
computing the lower and upper limits of a confidence interval a€" specifically the use of Equations 8.14 and
8.15. Levin (2005, personal communication) notes that unlike Equations 8.8 and 8.9, Equations 8.14 and 8.15
do not employ a correction for continuity. Because o f the latter, any discrepancy between limits o f a
confidence interval computed with Equations 8.14 and 8.15 versus Equations 8.8 and 8.9 will increase as the
sample size decreases. Note that when Equations 8.14 and 8.15 are employed below for the values p = .242, p
1 2

= .758, and n = 120, they yield a result almost identical to that obtained with Equations 8.8 and 8.9.
12 Since, when k = 2, it is possible to state two directional alternative hypotheses, some sources refer to an
analysis o f such a nondirectional alternative hypothesis as a two-tailed test. Using the same logic, when k > 2
one can conceptualize a test o f a nondirectional alternative hypothesis as a multi-tailed test (since, when k > 2,
it is possible to have more than two directional alternative hypotheses). I t should be pointed out that since the
chi-square goodness-of-fit test only utilizes the right tail o f the distribution, it is questionable to use the terms
two-tailed or multi-tailed in reference to the analysis o f a nondirectional alternative hypothesis.
13 k! is referred to as k factorial. The notation indicates that the integer number preceding the ! is multiplied by
all integer values below it. Thus, k! = (k)(k a€" 1)a€] (1). By definition 0! is set equal to 1. A method of
computing an approximate value for n! was developed by James Stirling (1730). (The letter n is more
commonly employed as the notation to represent the number for which a factorial value is computed a€" i.e.,
the notation n!) Stirlinga€^ s approximation (described in Feller (1968), Miller and Miller (1999), and Zar
M

(1999)) is n!=2nI€(n/e)n which can also be written as n!=2I€(nn+.5)(ea"'n). As noted in Endnote 15 in the
Introduction, the value e in the Stirling equation is the base o f the natural system o f logarithms, e, which
equals 2.71828a€]., is an irrational number (i.e., a number that has a decimal notation which goes on forever
without a repeating pattern o f digits).
14 The subscript .72 in the notation IJ.722 represents the .72 level of significance. The value .72 is based on the
fact that the extreme 72% o f the right tail o f the chi-square distribution is employed in evaluating the
th
directional alternative hypothesis. The value IJ.722 falls at the 28 percentile of the distribution.
15 A general discussion o f differences between the latter two tests can be found in Section V I I o f the
Kolmogorova€"Smirnov goodness-of-fit test for a single sample.
16 When categories are ordered, there is a direct (or inverse) relationship between the magnitude o f the score o f a
subject on the variable being measured and the ordinal position of the category to which that score has been
assigned. A n example of ordered categories which can be employed with the chi-square goodness-of-fit test
are the following four categories that can be used to indicate the magnitude of a persona€™s IQ: Cell 1 a€" 1st
quartile; Cell 2 a€" 2nd quartile; Cell 3 a€" 3rd quartile; and Cell 4 a€" 4th quartile. The aforementioned
categories can be employed i f one wants to determine whether or not, within a sample, an equal number o f
subjects are observed in each of the four quartiles. Note that in Examples 8.1 and 8.2, the fact that an
observation is assigned to Cell 6 is not indicative o f a higher level o f performance than or superior quality to an
observation assigned to Cell 1 (or vice versa). However, in the IQ example there is a direct relationship
between the number used to identify each cell and the magnitude of IQ scores for subjects who have been
assigned to that cell.
17 Note that the sum of the proportions must equal 1.
18 Even though a nondirectional alternative hypothesis will be assumed, this example illustrates a case in which
some researchers might view it more prudent to employ a directional alternative hypothesis.
1 a) The binomial distribution is based on a process developed by the Swiss mathematician James Bernoulli
(1654a€"1705). Each of the trials in an experiment involving a binomially distributed variable is often referred
to as a Bernoulli trial. The conditions for Bernoulli trials are met when, in a set o f repeated independent trials,
on each trial there are only two possible outcomes, and the probability for each o f the outcomes remains
unchanged on every trial; b) The binomial model assumes sampling with replacement. To understand the
latter term, imagine an urn which contains a large number of red balls and white balls. In each of n trials one
ball is randomly selected from the urn. I n the sampling with replacement model, after a ball is selected it is
put back in the urn, thus insuring that the probability o f drawing a red or white ball will remain the same on
every trial. On the other hand, in the sampling without replacement model, the ball that is selected is not put
back in the urn after each trial. Because of the latter, in the sampling without replacement model the probability
o f selecting a red ball versus a white ball w i l l change from trial to trial, and on any trial the value o f the
probabilities will be a function o f the number o f balls o f each color which remain in the urn. The binomial
model assumes sampling with replacement, since on each trial the likelihood an observation will fall in
Category 1 will always equal I € , and the likelihood an observation will fall in Category 2 w i l l always equal
1

I€ . The classic situation for which the binomial model is employed is the process of flipping a fair coin. In the
2

coin flipping situation, on each trial the likelihood of obtaining Heads is I € = .5, and the likelihood of
1

obtaining Tails is I € = .5. The process of flipping a coin can be viewed within the framework of sampling with
2

replacement, since it can be conceptualized as selecting from a large urn that is filled with the same number o f
Heads and Tails on every trial. I n other words, ita€™s as i f after each trial the alternative which was selected
on that trial is thrown back into the urn, so that the likelihood o f obtaining Heads or Tails w i l l remain
unchanged from trial to trial. In Section I X (the Addendum) the hypergeometric distribution (another
discrete probability distribution) is described, which is based upon the sampling without replacement model; c)
The binomial distribution is actually a special case o f the multinomial distribution. In the latter distribution,
each of n independent observations can be classified in one of k mutually exclusive categories, where k can be
any integer value equal to or greater than two. The multinomial distribution is described in detail in Section I X
(the Addendum).
2 The reader should take note of the fact that many sources employ the notations p and q to represent the
population proportions I € and I€ . Because o f this, the equations for the mean and standard deviation of a
1 2

binomially distributed variable can be written as follows: I / = np and I/=npq. The use of the symbols I € and 1

I€ in this chapter for the population proportions is predicated on the fact that throughout this book Greek
2

letters are employed to represent population parameters.


3 Using the format employed for stating the null hypothesis and the nondirectional alternative hypothesis for the
chi-square goodness-of-fit test, H and H can also be stated as follows for the binomial sign test for a single
0 1

sample: H : o,- = Iu,- for both cells; H : o,- a99o I ^ , - for both cells. Thus, the null hypothesis states that in the
0 1

underlying population the sample represents, for both cells/categories the observed frequency o f a cell is equal
to the expected frequency o f the cell. The alternative hypothesis states that in the underlying population the
sample represents, for both cells/categories the observed frequency o f a cell is not equal to the expected
frequency o f the cell.
4 The question can also be stated as follows: I f n = 10 and I € = I € = .5, what is the probability of two or less
1 2

observations in one of the two categories? When I € = I € = .5, the probability of two or less observations in
1 2

Category 2 w i l l equal the probability o f eight or more observations in Category 1 (or vice versa). When,
however, I € a99o I € , the probability of two or less observations in Category 2 will not equal the probability of
1 2

eight or more observations in Category 1.


5 The number o f combinations of n things taken x at a time represents the number of different ways that n
objects can be arranged (or selected) x at a time without regard to order. For instance, i f one wants to determine
the number of ways that 3 objects (which will be designated A, B, and C can be arranged (or selected) 2 at a
time without regard to order, the following 3 outcomes are possible: 1) An A and a B (which can result from
either the sequence AB or BA); 2) A n A and a C (which can result from either the sequence AC or CA); or 3) A
B and a C (which can result from the sequence BC or CB). Thus, there are 3 combinations of ABC taken 2 at a
time. This is confirmed below through use o f Equation 9.4.
6 The application of Equation 9.3 to every possible value of x (i.e., in the case of Examples 9.1 and 9.2, the integer
values 0 through 10) will yield a probability for every value of x. The sum of these probabilities will always
equal 1. The algebraic expression which summarizes the summation o f the probability values for all possible
values of x is referred to as the binomial expansion (summarized by Equation 9.5) which is equivalent to the
n
general equation (I€ + I € ) (or (p + q) when p and q are employed in lieu of I€ and I€ ). Thus:
1 2 1 2

7 I f I€ = .5 and one wants to determine the likelihood of x being equal to or less than a specific value, one can
employ the cumulative probability listed for the value (nd€"x). Thus, i f x = 2, the cumulative probability for x
= 8 (which is .0547) is employed since n-x = 10a€"2 = 8. The value .0547 indicates the likelihood of obtaining
2 or less observations in a cell. This procedure can only be used when I € = .5, since, when the latter is true, the
1

binomial distribution is symmetrical.


8 I f in using Tables A6 and A7 the value of I € is employed to represent I€ in place of I€ , and the number of
2 1

observations in Category 2 is employed to represent the value of x instead of the number of observations in
Category 1, then the following are true: a) I n Table A6 ( i f all values of I€ within the range from 0 to 1 are
listed) the probability associated with the cell that is the intersection of the values I€ = I € and x (where x 2

represents the number of observations in Category 2) will be equivalent to the probability associated with the
cell that is the intersection of I€ = I€ and x (where x represents the number of observations in Category 1); and
1

b) I n Table A7 ( i f all values of I€ within the range from 0 to 1 are listed) for I€ = I € , the probability of 2

obtaining x or fewer observations (where x represents the number o f observations in Category 2) will be
equivalent to I€ = I € , the probability of obtaining x or more observations (where x represents the number of
1

observations in Category 1). Thus i f I€ = .7, I € = .3, and n = 10 and there are 9 observations in Category 1 and
l 2

1 observation in Category 2, the following are true: a) The probability in Table A6 for I€ = I € = .3 and x = 1 2

w i l l be equivalent to the probability for I€ = I € = .7 and x = 9; and b) I n Table A7 i f I€ = I € = .3, the


2 2

probability of obtaining 1 or fewer observations will be equivalent to the probability o f obtaining 9 or more
observations i f I€ = I€ =.7. 1

9 It will also answer at the same time whether p = 2/10 = .2, the observed proportion o f cases for Category 2,
2

deviates significantly from I € = .5. 2

10 Since, like the normal distribution, the binomial distribution is a two-tailed distribution, the same basic protocol
is employed in interpreting nondirectional (i.e., two-tailed) and directional (one-tailed) probabilities. Thus, in
interpreting binomial probabilities one can conceptualize a distribution which is similar in shape to the normal
distribution, and substitute the appropriate binomial probabilities in the distribution. (In the interest o f
accuracy, one should take note of the fact that the greater the discrepancy between the values of I€ and I€ , the 1 2

more dissimilar in shape the binomial becomes from the normal distribution.)
11 In Example 9.4 many researchers might prefer to employ the directional alternative hypothesis H : I € < .5, 1 1

since the senator will only change her vote i f the observed proportion in the sample is less than .5. In the same
respect, in Example 9.5 one might employ the directional alternative hypothesis H : I€ > .5, since most people
1 1

would only interpret above chance performance as indicative o f extrasensory perception.


12 Equation 9.7 can also be expressed in the form z = (X a€" Note that the latter equation is identical to
Equation I.38, the equation for computing a standard deviation score for a normally distributed variable. The
difference between Equations 9.7 and I.38 is that Equation 9.7 computes a normal approximation for a
binomially distributed variable, whereas Equation I.38 computes an exact value for a normally distributed
variable.
13 The reader may be interested in knowing that in extrasensory perception (ESP) research, evidence of ESP is not
necessarily limited to above chance performance. A person who consistently scores significantly below chance
or only does so under certain conditions (such as being tested by an extremely skeptical and/or hostile
experimenter) may also be used to support the existence of ESP. Thus, in Example 9.6, the subject who obtains
a score o f 80 (which is significantly below the expected value I / = 100) represents someone whose poor
performance (referred to as psi missing) might be used to suggest the presence o f extrasensory processes.
14 When the binomial sign test for a single sample is employed to evaluate a hypothesis regarding a population
median, it is categorized by some sources as a test o f ordinal data (rather than as a test o f categorical/nominal
data), since when data are categorized with respect to the median it implies ordering o f the data within two
categories (i.e., above the median versus below the median).
15 a) I n the discussion of the Wilcoxon signed-ranks test, it is noted that the latter test is not recommended i f
there is reason to believe the underlying population distribution is asymmetrical. Thus, i f there is reason to
believe that blood cholesterol levels are not distributed symmetrically in the population, the binomial sign test
for a single sample would be recommended in lieu o f the Wilcoxon signed-ranks test; b) Marascuilo and
Mc-Sweeney (1977) note that the asymptotic relative efficiency (discussed in Section V I I of the Wilcoxon
signed-ranks test) o f the binomial sign test for a single sample is generally lower than that o f the Wilcoxon
signed-ranks test. I f the underlying population distribution is normal, the asymptotic relative efficiency o f the
binomial sign test is .637, in contrast to an asymptotic relative efficiency of .955 for the Wilcoxon signed-
ranks test (with both asymptotic relative efficiencies being in reference to the single-sample t test). When the
underlying population distribution is not normal, in most cases the asymptotic relative efficiency o f the
Wilcoxon signed-ranks test w i l l be higher than the analogous value for the binomial sign test.
16 The reader should take note of the fact that the protocol in using Table A7 to interpret a I € value that is less
2

than .5 in reference to the value I € = .1 is different from the one described in the last paragraph of Section IV.
2

The reason for this is that in Example 9.9 we are interested in (for I € = . 1) the probability that the number of
2

observations in Category 2 (females) is equal to or greater than 3 (which equals the probability that the number
of observations in Category 1 (males) is equal to or less than 7 for I € = .9). The protocol presented in the last
1

paragraph of Section I V in reference to the value I € = .3 describes the use of Table A7 to determine the
2

probability that the number o f observations in Category 2 is equal to or less than 1 (which equals the
probability that the number of observations in Category 1 is equal to or greater than 9 for I € = .7). Note that in
1

Example 9.9 a more extreme score is defined as one which is larger than the lower o f the two observed
frequencies or smaller than the larger o f the two observed frequencies. On the other hand, in the example in the
last paragraph o f Section I V a more extreme score is defined as one that is smaller than the lower o f the two
observed frequencies or larger than the higher o f the two observed frequencies. The criterion for defining what
constitutes an extreme score is directly related to the alternative hypothesis the researcher employs. I f the
alternative hypothesis is nondirectional, an extreme score can fall both above or below an observed frequency,
whereas i f a directional alternative hypothesis is employed, a more extreme score can only be in the direction
indicated by the alternative hypothesis.
17 Endnote 15 in the Introduction states that the value e is the base of the natural system of logarithms, e, which
equals 2.71828a€], is an irrational number (i.e., a number that has a decimal notation that goes on forever
without a repeating pattern o f digits).
18 I f instead o f employing sick and guilty as the designated events, we specified healthy and innocent as the
designated events, the values P(A ) = .99 (for Example 9.29) and P(A ) = .98 (For Example 9.30) represent the
2 2

baserates o f the events in question.


19 The proportion .0288 (i.e., the sum of the proportions in the first column) is commonly referred to as the
selection ratio, since it represents the proportion o f people in the population who are identified as positive.
The term selection ratio is commonly employed in situations when individuals whose test result is positive are
selected for something (e.g., a treatment, a job, etc.). The proportion .9712 (i.e., the sum of the proportions in
the second column) is (1 d€" selection ratio), since it represents the proportion of people who are not selected.
20 Another type o f test which is commonly employed in assessing honesty (typically, within the framework o f
screening people for employment) is the integrity test (also referred to as an honesty test). Like the
polygraph, the latter type o f test (which is a pencil and paper questionnaire) is subject to criticism for the large
number o f false positives it yields. The author has seen false positive rates for integrity tests that are close to
.50 (i.e., 50%). It should be noted, however, that as a general rule the only consequence o f being a false
positive on an integrity test is that a person is not hired for a job. The seriousness of the latter error would be
viewed as minimal when contrasted with a false positive on a polygraph, i f a positive polygraph result is
construed as evidence o f one being guilty o f a serious crime (although, in reality, the use o f polygraph evidence
in the courts is severely restricted).
21 Alcock (1981), Hansel (1989), and Hines (2002) are representative o f sources which would argue that the
baserate for ESP is zero. On the other hand, Broughton (1991) and Irwin (1999) are representative of sources
which would argue for a baserate above zero.
22 In order to provide an even more dramatic example of the low baserate problem, I have taken a number of
liberties in conceptualizing the analysis of Example 9.31. To begin with, a null and alternative hypothesis
would not be stated for a Bayesian analysis, and, consequently, the terms Type I and Type I I error rates are not
employed within the framework o f such an analysis. As w i l l be noted later in this section, in a Bayesian
analysis, based on preexisting information (i.e., prior probabilities), posterior probabilities are computed for
two or more specific hypotheses, neither o f which is designated as the null hypothesis. It should also be
emphasized that within the framework o f Example 9.31, I will assume that any subject who performs at a
statistically significant level, performs at exactly the .05 level. The reason for this is that i f the probability
associated with a subjecta€™s level o f performance is less than .05, the posterior probabilities computed for
that subject w i l l not correspond to those computed for the example. Although my conceptualization o f
Example 9.31 is a bit unrealistic, it can nevertheless be employed to illustrate the pitfalls involved in the
evaluation o f a low baserate event.
23 The values employed for Example 9.32 are hypothetical probabilities which are not actually based on empirical
data in the current medical literature.
24 The reader should take note o f the fact that, in actuality, there is no such disease as leprosodiasis, and the
probability values employed in Example 9.33 are fictional.
25 A comprehensive discussion o f the subject o f controlling for order effects (i.e., controlling for the order o f
presentation of the experimental conditions) can be found in Section V I I of the t test for two dependent
samples (Test 17).
26 Other sources, such as Cowles (1989), describe alternative ways o f conceptualizing probability.
27 I t should be noted that the Bayesian hypothesis testing model was rejected by Ronald Fisher, Jerzy Neyman,
and Egon Pearson, the three men upon whose ideas the classical hypothesis testing model is based.
28 A n excellent discussion of odds can be found in Christensen (1990, 1997). Berry (1996, pp. 116a€"119) and
Pitman (1993, pp. 6a€"9) provide good discussions on the use of odds in betting.
29 Since the mean, mode, and standard deviation o f a beta distribution are a function o f the parameters, the latter
values can be computed with Equations 9.43, 9.44, and 9.45.
1 A n alternate definition o f randomness employed by some sources is that in a random series, each o f k possible
alternatives is equally likely to occur on any trial, and that the outcome on each trial is independent o f the
outcome on any other trial. The problem with the latter definition is that it cannot be applied to a series in
which on each trial there are two or more alternatives which do not have an equal likelihood o f occurring (the
stipulation regarding independence does, however, also apply to a series involving alternatives that do not have
an equal likelihood o f occurring on each trial). I n point o f fact, it is possible to apply the concept o f
randomness to a series in which I € a99o I€ . To illustrate the latter, consider the following example. Assume
1 2

we have a series consisting o f N trials involving a binomially distributed variable for which there are two
possible outcomes A and B. The theoretical probabilities in the underlying population for each o f the outcomes
are I€ = .75 and I€ = .25. I f a series involving the two alternatives is in fact random, on each trial the
A B

respective likelihoods of alternative A versus alternative B occurring will not be I € = I € = .5, but instead will
A B

be I € = .75 and I € = .25. I f such a series is random it is expected that alternative A will occur approximately
A B

75% o f the time and alternative B will occur approximately 25% o f the time. However, it is important to note
that one cannot conclude that the above series is random purely on the basis of the relative frequencies of the
two alternatives. To illustrate this, consider the following series consisting of 28 trials, which is characterized
by the presence of an invariant pattern: A A A B A A A B A A A B A A A B A A A B A A A B A A A B . I f one is attempting
to predict the outcome on the 29th trial, and if, in fact, the periodicity o f the pattern that is depicted is invariant,
the likelihood that alternative A will occur on the next trial is not .75, but is, in fact, 1. This is the case, since
the occurrence of events in the series can be summarized by the simple algorithm that the series is comprised
of 4 trial cycles, and within each cycle alternative A occurs on the first 3 trials and alternative B on the fourth
trial. The point to be made here is that it is entirely possible to have a random series, even i f each o f the
alternatives is not equally likely to occur on every trial. However, i f the occurrence o f the alternatives is
consistent with their theoretical frequencies, the latter in and o f itself does not insure that the series is random.
2 It should be pointed out that, in actuality, each of the three series depicted in Figure 10.1 has an equal likelihood
o f occurring. However, in most instances where a consistent pattern is present which persists over a large
number o f trials, such a pattern is more likely to be attributed to a nonrandom factor than it is to chance.
3 The computation of the values in Table A8 is based on the following logic. I f a series consists of N trials and
alternative 1 occurs n1 times and alternative 2 occurs n2 times, the number o f possible combinations involving
alternative 1 occurring n1 times and alternative 2 occurring n2 times will be (Nn1)=N!/(n1!n2!). Thus, i f a coin
is tossed N = 4 times, since (42)=4!/(2!2!)=6, there will be 6 possible ways o f obtaining n1 = 2 Heads and n2 =
2 Tails. Specifically, the 6 ways o f obtaining 2 Heads and 2 Tails are: HHTT, TTHH, THHT, HTTH, THTH,
HTHT. Each o f the 6 aforementioned sequences constitutes a series, and the likelihood o f each o f the series
occurring is equal. The two series HHTT and TTHH are comprised o f 2 runs, the two series THHT and HTTH
are comprised o f 3 runs, and the two series THTH and HTHT are comprised o f 4 runs. Thus, the likelihood o f
observing 2 runs w i l l equal 2/6 = .33, the likelihood of observing 3 runs will equal 2/6 = .33, and the likelihood
of observing 4 runs w i l l equal 2/6 = .33. The likelihood of observing 3 or more runs w i l l equal .67, and the
likelihood of observing 2 or more runs will equal 1. A thorough discussion of the derivation of the sampling
distribution for the single-sample runs test, which is attributed in some sources to Wald and Wolfowitz
(1940), is described in Hogg and Tanis (1988).
4 Some o f the cells in Table A8 only list a lower limit. For the sample sizes in question, there is no maximum
number o f runs (upper limit) which will allow the null hypothesis to be rejected.
5 A general discussion o f the correction for continuity can be found in Section V I o f the Wilcoxon signed-ranks
test (Test 6). The reader should take note o f the fact that the correction for continuity described in this section
is intended to provide a more conservative test o f the null hypothesis (i.e., make it more difficult to reject).
However, when the absolute value of the numerator of Equation 10.1 is equal to or very close to zero, the z
value computed with Equation 10.2 will be further removed from zero than the z value computed with
Equation 10.1. Since the continuity-corrected z value w i l l be extremely close to zero, this result is of no
practical consequence (i.e., the null hypothesis will still be retained). Zar (1999, p. 493), however, notes that in
actuality the correction for continuity should not be applied i f it increases rather than decreases the absolute
value o f the test statistic. This observation regarding the correction for continuity can be generalized to the
continuity correction described in the book for other nonparametric tests.
6 The term Monte Carlo derives from the fact that Ulam had an uncle with a predilection for gambling who often
frequented the casinos at Monte Carlo.
7 The application o f the single-sample runs test to a design involving two independent samples is described in
Siegel (1956) under the Walda€"Wolfowitz (1940) runs test.
8 A number is prime i f it has no divisors except for itself and the value 1. In other words, i f a prime number is
divided by any number except itself or 1, it will yield a remainder. Examples o f prime numbers are 3, 7, 11, 13,
17, etc.
9 The author is indebted to Ted Sheskin for providing some o f the reference material employed in this section.
10 Although Equation 10.2 (the continuity-corrected equation for the single-sample runs test) yields a slightly
smaller absolute z value for this example, it leads to identical conclusions with respect to the null hypothesis.
11 Gruenberger and Jaffray (1965) note than an even more stringent variant of the poker test employs digits 2
through 6 as the second hand, digits 3 through 7 as the third hand, digits 4 through 8 as the fourth hand, and so
on. The analysis is carried on until the end o f the series (which will be the point at which a five-digit hand is no
longer possible). The total of (n a€" 4) possible hands can be evaluated with the chi-square goodness-of-fit
test. The use o f the latter test, however, is problematical, since, as described above, the hands are not actually
independent o f one another (because they contain overlapping data), and consequently the assumption o f
independence for the chi-square goodness-of-fit test is violated.
12 a) Phillips et al. (1976) and Schmidt and Taylor (1970) describe the computation of the probabilities that are
listed for the poker test for a five digit hand; b) Although it is generally employed with groups o f five digits,
the poker test can be applied to groups which consist o f more or less than five digits. The poker test
probabilities (for k = 10 digits) for a four digit hand (Schmidt and Taylor (1970)) and a three digit hand
(Banks and Carson (1984)), along with a sample hand, are as follows: Four digit hand: All four digits
different (1234;p = .504); One pair (1123; p = .432); Two pair (1122; p = .027); Three of a kind (1112; p =
.036 ); Four of a kind (1111; p = .001.). Three digit hand: All three digits different (123; p = .72); One pair
(112; p = .27 ); Three of a kind (111; p = .01).
13 Larsen and Marx (1985, p. 246) also discuss the methodology for computing the expected number o f digits
required for a complete set. Stirzaker (1999, p. 268) notes that the equation a"'r=1kk/(ka"'r+1) is an alternative
way of expressing the harmonic series k[(1/k) + 1/(k a€" 1) + 1/(k a€" 2) + a€" + 1/2 + 1].
I . Alternative terms which are commonly used to describe the different samples employed in an experiment are
groups, experimental conditions, and experimental treatments.
2.It should be noted that there is a design in which different subjects serve in each o f the k experimental conditions
that is categorized as a dependent samples design. I n a dependent samples design each subject either serves
in all of the k experimental conditions, or else is matched with a subject in each of the other (k a€" 1)
experimental conditions. When subjects are matched with one another they are equated on one or more
variables which are believed to be correlated with scores on the dependent variable. The concept o f matching
and a general discussion o f the dependent sample design can be found under the t test for two dependent
samples.
3.An alternative but equivalent way of writing the null hypothesis is H : I / a€" I % = 0. The analogous 0 1 2

alternative but equivalent ways o f writing the alternative hypotheses in the order they are presented are: H : 1

I / a€" I % a99o 0, H : I / a€" I % > 0, H : I / a€" I % < 0.


1 2 1 1 2 1 1 2

4.In order to be solvable, an equation for computing the t statistic requires that there is variability in the scores o f
at least one o f the two groups. I f all subjects in Group 1 have the same score and all subjects in Group 2 have
the same score, the values computed for the estimated population variances will equal zero (i.e.,
sEoe12=sEoe22=0). I f the latter is true the denominator of any of the equations to be presented for computing
the value o f t will equal zero, thus rendering a solution impossible.
5. When n = n , sEoepooled=(sEoe12+sEoe22)/2.
1 2

6. The actual value that is estimated by sXA 1a"'XA 2 is I / X A 1a"'XA 2, which is the standard deviation of the
sampling distribution o f the difference scores for the two populations. The meaning o f the standard error of
the difference can be best understood by considering the following procedure for generating an empirical
sampling distribution o f difference scores: a) Obtain a random sample o f n1 scores from Population 1 and a
random sample o f n2 scores from Population 2; b) Compute the mean o f each sample; c) Obtain a difference
score by subtracting the mean of Sample 2 from the mean of Sample 1 a€" i.e., X A 1a"'XA 2=D; and d)
Repeat steps a) through c) m times. A t the conclusion of this procedure one w i l l have obtained m difference
scores. The standard error of the difference represents the standard deviation of the m difference scores, and
can be computed by using Equation I.11/2.1. Thus: sXA 1a"'XA 2=[ (l£D2a"'( (l£D)2/m) ]/[ ma"'1 ] . The
standard deviation that is computed with the aforementioned equation is an estimate of I / X A 1a"'XA 2.
7. Equation 11.4 can also be written in the form df = (n a€" 1) + (n a€" 1), which reflects the number of degrees
1 2

o f freedom for each o f the groups.


8. The absolute value of t is employed to represent t in the summary statement.
9. a) The Fmax test is one o f a number o f statistical procedures that are named after the English statistician Ronald
Fisher. Among Fishera€™s contributions to the field of statistics was the development of a sampling
distribution referred to as the F distribution (which bears the first letter o f his surname). The values in the F max

distribution are derived from the F distribution; b) Alternative tests (Levenea€™s test for homogeneity of
variance (Test 21g) and The Brown-Forsythe test for homogeneity ofvariance (Test21h)) for evaluating
homogeneity o f variance (which like the F test can be employed for contrasting k A'A 2 experimental
max

conditions), as well as a more in depth discussion o f the pros and cons o f the available tests o f homogeneity o f
variance, can be found in Section V I of the single-factor between-subjects analysis ofvariance. Additionally,
Grissom and K i m (2005, pp. 12a€"21) provide a detailed discussion of the limitations of the more commonly
employed methodologies for assessing homogeneity o f variance.
10. A tabled F 9 7 5 value is the value below which 97.5% of the F distribution falls and above which 2.5% of the
distribution falls. A tabled F value is the value below which 99.5% of the F distribution falls and above
9 9 5

which .5% of the distribution falls.


I I . In Table A9 the value F = 23.2 is the result of rounding off F
m a x 0 1 = 23.15. 9 9 5

12. A tabled F 0 2 5 value is the value below which 2.5% of the F distribution falls and above which 97.5% of the
distribution falls. A tabled F.005 value is the value below which .5% o f the F distribution falls and above which
99.5% of the distribution falls.
13. a) Most sources only list values in the upper tail of the F distribution. The values F = .157 and F = .063 are 0 5 0 1

obtained from Guenther (1965). I t so happens that when df = dfd , the value of F can be obtained by
num en 0 5

dividing 1 by the value of F . Thus: 1/6.39 = . 157. In the same respect the value of F can be obtained by
9 5 0 1

dividing 1 by the value of F . Thus: 1/15.98 = .063 ?b) Endnote 52 of the single-factor between-subjects
9 9

analysis of variance describes the procedure for computing a confidence interval for the ratio o f two
variances, which allows for an alternative way of conducting the F test for two population variances.
2 2
14. When n = n = n and ta€ = t = t , the ta€ value computed with Equation 11.9 w i l l equal the tabled critical t
1 2 1 2

value for d f = n a€" 1. When n a99o n , the computed value of ta€2 w i l l fall in between the values of t and t . It
1 2 1 2

should be noted that the effect of violation of the homogeneity of variance assumption on the t test statistic
decreases as the number of subjects employed in each of the samples increases. This can be demonstrated in
relation to Equation 11.9, in that i f there are a large number o f subjects in each group the value which is
employed for both t and t in Equation 11.9 is t = 1.96. The latter tabled critical two-tailed .05 value, which
1 2 0 5

is also the tabled critical two-tailed .05 value for the normal distribution, is the value that is computed for ta€2.
Thus, in the case of large sample sizes the tabled critical value for d f = n + n a€" 2 will be equivalent to the
1 2

value computed for df = n a€" 1 and df = n a€" 1.


1 2

15.a) A d value can be interpreted as a z score. I f the latter is the case then the value d = 1.24 indicates the
obtained difference between the two groups is 1.24 standard deviation units. Using the normal distribution
(which is assumed to be applicable for the analysis), a d value o f 1.24 indicates that the average person in the
drug group was better o f f at the end o f the study than 89.25% o f the people who were in the placebo group.
The value 89.25% is the result of adding 39.25% (which is the percent of cases in the normal distribution that
fall between the mean of the distribution and a z value o f 1.24) to the 50% of the cases that fall below the mean
of the normal distribution. The latter is illustrated in Figure 11.11. Table 11.3 can be employed in evaluating a
d value obtained in any study comparing a treatment group versus a control group in which the treatment group
was found to be superior to the control group with an obtained effect size equal to the value of d noted in the
first column. The value in the second column of the table represents the percentage of subjects in the control
group whose performance is below the average score of a subject in the treatment group. The aforementioned
way o f interpreting a d value can be applied to interpreting an effect size value for a single study (such as
Example 4.1) as well as a set of d values obtained for multiple studies within the context of a meta-analysis
(which is discussed in detail in Test 43). In meta-analysis the results of multiple studies evaluating the same
hypothesis can be combined such that a summary d value for all o f studies can be computed. As an example o f
the latter, Smith, Glass, and Miller (1980) published a meta-analytic summary o f 475 controlled studies o f
people who received psychotherapy when contrasted with control subjects who did not receive psychotherapy.
A meta-analysis of the data found that psychotherapy had an average effect size o f d = .85, which indicated
that the average person receiving therapy was better off after therapy than 80.23% o f the people who did not
receive therapy.
16 A comprehensive discussion on the appropriate use o f the harmonic mean versus the arithmetic mean as a
measure of central tendency can be found in Chou (1989, pp. 110a€"113). In the discussion Chou (1989, p.
112) notes that the use o f the harmonic mean is best suited for situations in which the scores in a distribution
are expressed inversely to what is desired to be reflected by the average score computed for the distribution.
17 van Belle (2002, pp. 31a€"33) notes that Equation 11.27 can be employed to estimate the sample size required
in order to conduct a t test for two independent samples, i f a nondirectional analysis is conducted with I± =
.05. The value o f k in the numerator o f the latter equation will depend on the desired power o f the test, and for
the power values .50, .80, .90, .95, and .975 the values to employ for k are, respectively, 8, 16, 21, 26, and 31.
Equation 11.27 w i l l be demonstrated in reference to computing the necessary sample size for the power o f the
test to equal .80, i f as previously the researcher wants to employ the latter power for evaluating the alternative
hypothesis: H :1 a€" I % | a99o¥ 5. I f the power of the test is to equal .80, the value, k = 16 is employed in the
2

numerator of Equation 11.27. Equation 11.10 is employed to compute the value d = 1.24 (which was computed
earlier by dividing the 5 point difference stipulated in the alternative hypotheses by the value sEopooled=4.02,
which represents the best estimate of I / ) . When the appropriate values are substituted in Equation 11.27, the
t h
value n = 10.41 is computed (where n represents the number of subjects in the j group). The latter result
indicates that in order for the power of the test to equal .80, each group should have approximately 11 subjects.
18 The treatment effect described in this section is not the same thing as Cohena€™s d index (the effect size
computed with Equation 11.10). However, i f a hypothesized effect size is present in a set o f data, the computed
value of d can be used as a measure of treatment effect (in point of fact, the sample analogue g index is usually
employed to represent effect size). In such an instance, the value of d (or the g index) w i l l be positively
correlated with the value of the treatment effect described in this section. Cohen (1988, pp. 24a€"27) describes
how the d index can be converted into one o f the correlational treatment effect measures which are discussed
in this section. Endnote 20 discusses the relationship between the d index and the omega squared statistic
presented in this section in more detail.
19 The reader familiar with the concept o f correlation can think o f a measure o f treatment effect as a correlational
measure which provides information analogous to that provided by the coefficient of determination
2
(designated by the notation r ) , which is the square of the Pearson product-moment correlation coefficient.
The coefficient of determination (which is discussed in more detail in Section V of the Pearson product-
moment correlation coefficient) measures the degree o f variability on one variable which can be accounted
for by variability on a second variable. This latter definition is consistent with the definition that is provided in
this section for a treatment effect.
20 a) In actuality, Cohen (1977, 1988) employs the notation for eta squared (which is discussed briefly in the
next paragraph and in greater detail in Section V I of the single-factor between-subjects analysis of variance)
in reference to the aforementioned effect size values. Endnote 58 in the single-factor between-subjects
analysis of variance clarifies Cohena€™s (1977, 1988) use of eta squared and omega squared to represent
the same measure; b) Cohen (1977, 1988, pp. 23a€"27) states that the small, medium, and large effect size
values of .0099, .0588, and .1379 are equivalent to the values .2, .5, and .8 for his d index (which was
discussed previously in the section on statistical power). I n point o f fact, the values .2, .5, and .8 represent the
minimum values for a small, medium, and large effect size for Cohena€™s d index. The conversion of an
omega squared/eta squared value into the corresponding Cohena€™s d index value is described in Endnote
5 of Test 43 on meta-analysis; c) Grissom and K i m (2005, p. 5) note it is not uncommon for measures of
effect size to overestimate the actual size of an effect in an underlying population. The latter is referred to as
positive or upward bias. The latter authors provide information regarding recommended adjustments for some
o f the measures o f effect size discussed in this book.
21 This result can also be written as: a"'.89 < ( I % a€"
2 < 10.89.
22 In instances where, in stating the null hypothesis, a researcher stipulates that the difference between the two
population means is some value other than zero, the numerator of Equation 11.18 is the same as the numerator
of Equation 11.5. The protocol for computing the value of the numerator is identical to that employed for
Equation 11.5.
23 The general issues discussed in this section are relevant to any case in which a parametric and nonparametric
test can be employed to evaluate the same set o f data.
24 Barnett and Lewis (1994) note that the presence o f an outlier may not always be obvious as a result o f visual
inspection o f data. Typically, the more complex the structure o f data the more difficult it becomes to visually
detect outliers. Multivariate analysis (which is described in the last section of the book) often involves data
for which visual detection o f outliers is difficult.
25 In this latter instance, if, as a result o f the presence o f one or more outliers, the difference between the group
means is also inflated, the use o f a more conservative test will, in part, compensate for the heterogeneity o f
variance. The impact o f outliers on the t test for two independent samples is discussed by Zimmerman and
Zumbo (1993). The latter authors note that the presence o f outliers in a sample may decrease the power o f the t
test to such a degree that the Mann-Whitney U test (which is the rank-order nonparametric analog of the t test
for two independent samples) w i l l be a more powerful test for comparing two independent samples.
26 Barnett and Lewis (1994, p. 84) note that the use o f the median absolute deviation as a measure o f
th
dispersion/variability can be traced back to the 19 century to the great German mathematician Johann Karl
Friedrich Gauss. Barnett and Lewis (1994, p. 156) state that although the median absolute deviation is a less
efficient measure o f dispersion than the standard deviation, it is a more robust estimator (especially for
nonnormally distributed data).
27 Samples in which data have been deleted or modified are sometimes referred to as censored samples (Barnett
and Lewis (1994, p. 78). The term censoring, however, is most commonly employed in reference to data that
are not available for some subjects, since it is either not desirable or possible to follow each subject until the
conclusion o f a study. This latter type o f censored data is most commonly encountered in medical research
when subjects no longer make themselves available for study, or a researcher is unable to locate subjects
beyond a certain period o f time. Good (1994, p. 117) notes that another example o f censoring occurs when,
within the framework o f evaluating a variable, the measurement breaks down at some point on the
measurement continuum (usually at an extreme point). Consequently, one must employ approximate scores
instead of exact scores to represent the observations which cannot be measured with precision. Two obvious
options that can be employed to negate the potential impact o f censored data are: a) Use of the median in lieu
of the mean as a measure of central tendency; and b) Employing an inferential statistical test which uses rank-
orders instead o f interval/ratio scores. A detailed discussion on the general subject o f censored data can be
found in Section IX (the Addendum) o f the Mann-Whitney U test. Within the context of the latter discussion
a number of methods for evaluating censored data are described including the Kaplana€"Meir estimate (Test
12d), Gehana€™s test for censored data (Test 12e), and the log-rank test (Test 12f). Censored data is also
briefly discussed later this section within the context o f the discussion o f clinical trials.
28 In this example, as well as other examples in this section, use of the F max test may not yield a significant result
(i.e., it may not result in the conclusion that the population variances are heterogeneous). The intent o f the
examples, however, is only to illustrate the variance stabilizing properties o f the transformation methods.
29 I f the relationship 1 radian = 57.3 degrees is applied for a specific proportion, the number of degrees computed
with the equation Y=arcsinX will not correspond to the number of radians computed with the equation
Y=2a€99oarcsinX. Nevertheless, i f the transformed data derived from the two equations are evaluated with the
same inferential statistical test, the same result is obtained. In point o f fact, i f the equation Y=arcsinX is
employed to derive the value of Y in radians, and the resulting value is multiplied by 57.3, it w i l l yield the same
number of degrees obtained when that equation is used to derive the value of Y in degrees. Since the
multiplication of arcsinX by 2 in the equation Y=2a€99oarcsinX does not alter the value of the ratio for the
difference between means versus pooled variability (or other relevant parameters being estimated within the
framework of a statistical test), it yields the same test statistic regardless of which equation is employed. The
author is indebted to Jerrold Zar for clarifying the relationship between the arcsine equations.
30 I n the previous edition o f this book an alternative test o f equivalence developed by Tryon (2001) was described
that yielded different results from those obtained with the Westlake-Schuirmann test. However, Tryon
(personal communication) subsequently identified an error in his methodology and determined that, in fact, his
procedure was equivalent to the Westlake-Schuirmann procedure.
31 Rodgers et al. (1993, pp. 554a€"555) note that within the framework of the classical hypothesis testing
model, a conventional two-tailed analysis (i.e., where the null hypothesis states equivalence o f conditions) can
be conceptualized as involving two one-tailed tests. In such an analysis, the null hypothesis is rejected i f either
one (but not both) o f the two one-tailed analyses is significant. Because in such an analysis a Type I error
occurs i f by chance either o f the two one-tailed tests is significant, the alpha level for each o f the one-tailed
tests must be added to compute the overall likelihood o f committing a Type I error. As an example, i f a two-
tailed/nondirectional analysis is conducted with alpha = .05, it is commensurate with conducting two one-tailed
tests employing alpha = .025 for each test. On the other hand, in the case of a test of equivalence, both one-
tailed analyses stipulated in the null hypothesis must be significant in order to reject the null hypothesis. I n the
case of a test of equivalence, the likelihood that an analysis evaluating the larger of the two absolute values
versus will yield a significant result, given the fact that the smaller of the two absolute values versus
did yield a significant result, is in fact equal to 1. Because o f the latter, the overall probability o f committing a
Type I error in such an analysis is the prestipulated alpha level employed for the smaller of the two absolute
values and multiplied by 1, which, in fact, is equal to the prestipulated alpha level employed for analysis
of the smaller of the two absolute values versus I ^ .
2

32 a) Among others, Seaman and Serlin (1998) note that i f an analysis evaluating the null hypothesis H : I % = 0 1

I % is not significant, the latter result should perhaps motivate a researcher to go a step further and conduct a
2

test of equivalence, since as noted earlier retention of the null hypothesis H : I % = I % is not commensurate
0 1 2

with demonstrating equivalence; b) Theoretically one would expect that when a statistically significant
difference exists between two treatments the confidence intervals computed for the means o f each o f the
treatments will not overlap. I f the intent o f a researcher is to conduct a two-tailed analysis with alpha set at .05,
the two-tailed .05 critical value would be employed in computing the 95% confidence interval for each o f the
means. Thus in the case of Example 11.4, in order to evaluate the null hypothesis H : I % = I % at the .05 level,
0 1 2

in lieu of conducting a t test for two independent samples, employing Equation 2.8/2.9 a 95% confidence
interval can be computed for each o f the two groups. The computation of the confidence intervals is presented
below. Note that the values sXA~1=1.3/5=.51, sXA~2=.7/5=.37, and t = 2.776 (the tabled critical two-tailed
os

.05 value for d f = n a€" 1 = 5 a€" 1 = 4) are employed in computing the two 95% confidence intervals. Since
the two confidence intervals overlap (i.e., the upper limit o f the confidence interval o f Group 2 (9.23) is larger
than the lower limit o f the confidence interval o f Group 1 (8.18)), the researcher cannot conclude there is a
significant difference between the two treatments. This result is consistent with the result obtained when the t
test for two independent samples was conducted. Recollect that when the latter test was employed to evaluate
Example 11.4, the nondirectional alternative hypothesis H : I % a99o I % was not supported when alpha = .05
1 1 2

was employed.
33 Note that in a symmetrical analysis, the absolute value of employed in Equation 11.22 for each of the two t
tests conducted within the framework of the test of equivalence will always be the same. If, on the other hand,
a nonsymmetrical analysis (which will not be demonstrated here) is conducted, the absolute value of
employed in each of the two t tests will not be the same, since by definition a99o |P}[ |.
2

34 Tryon (2001) notes that Dunnett and Gent (1977) employ the tabled critical two-tailed .05 critical value in
computing a confidence interval within the framework of a test of equivalence. I f the latter value t = 2.306 os

(for df = 8) is employed for Example 11.6, the width o f the confidence interval increases, thus resulting in a
less conservative test. Specifically, the confidence interval becomes 1.4 A ± (2.306)(.63) = 1.4 A ± 1.45, which
results in the confidence interval a"'.05 < a€" I % ) > 2.85. Thus any absolute value designated for
2

which is greater than 2.85 will yield a significant result for both o f the two t tests that are employed in testing
for equivalence. The test is less conservative since the researcher can specify a larger value for in order to
establish equivalence.
35 a) Strictly speaking, since the values of and should lie outside the confidence interval, the absolute value
of I , employed in Equation 11.22 should be slightly larger than 2.57 (e.g., 2.5700001); b) Technically, i f the
statement is made that the mean of Group 1 (Brand name drug) is within 2.57 units of the mean of Group 2
(Generic drug), the alternative hypotheses would be written as follows: H : I , a99o¥ a"'2.57 and H : I , a99oa
1 1

2.57. Since the equals sign should not be included in the null hypotheses, the latter should be written as H : I , < 0

a"2.57 or H : I , > 2.57.


0

36 a) Cribbie et al. (2004) conducted a simulation study contrasting the ability o f the t test for two independent
samples when used within the framework o f the classical hypothesis testing model (i.e., a conventional test
of significance as employed with Example 11.4) versus the Westlakea€"Schuirmann test of equivalence of
two independent treatments (employed to evaluate Example 11.6) in evaluating whether or not two
treatments were equivalent to one another. I n other words, when the t test for two independent samples was
used within the framework o f the classical hypothesis testing model, a nonsignificant result was interpreted as
evidence of equivalence. Their analysis suggested that the Westlakea€"Schuirmann test of equivalence of
two independent treatments is more effective than the conventional t test at detecting equivalence o f
population means when large sample sizes are employed in a study. However, the conventional t test was more
effective in detecting equivalence o f population means when small sample sizes were employed and/or when
group variances were large; b) The reader should take note o f the fact that in conducting a test o f equivalence a
researcher may elect to employ a smaller (or, for that matter, larger) alpha value than those used in the
examples in this section.
37 If, in fact, there is no difference between the means of the two populations (i.e., I % a€" I % = 0), Equation
1 2

11.23 becomes Equation 11.28. Under the latter circumstances in order for the power o f the test to be .80 only
n = 4 subjects are required per group.
1. a) The test to be described in this chapter is also referred to as the Wilcoxon rank-sum test and the
Manna€"Whitneya€"Wilcoxon test; b) Grissom and Kim (2005, p. 105) note that Kruskal(1957) provides an
overview o f the development o f the ideas underlying the test prior to its formal introduction, and that
alternative versions o f the test are discussed in Bergmann, Lundbrook and Spooren (2000); c) The population
median will be represented by I , , which is the lower case Greek letter theta.
2. The reader should take note o f the following with respect to the table o f critical values for the
Manna€"Whitney U distribution: a) No critical values are recorded in the Mann- Whitney table for very
small sample sizes, since a level of significance o f .05 or less cannot be achieved for sample sizes below a
specific minimum value; b) The critical values published in Manna€"Whitney tables by various sources may
not be identical. Such differences are trivial (usually one unit), and are the result o f rounding off protocol; and
c) The distribution of the Manna€"Whitney U statistic is two-tailed, and Table A11 only lists critical values
from the lower tail o f the distribution. The result o f the Manna€"Whitney U test can also be evaluated
through use o f critical values in the upper tail o f the distribution. In such a case one designates the larger o f the
two values U versus U as the obtained U statistic. I n the latter instance, the tabled critical value to employ in
1 2

evaluating the latter U statistic can be computed from Table A11 by subtracting the critical value recorded in
the relevant cell for the two sample sizes in the table from the product n1n2. Thus i f U1 = 21 is designated as the
value of U, since n n = 25, the tabled critical two-tailed .05 and .01 values are U = 25 a€" 2 = 23 and U =
1 2 0 5 0 1

25 a€" 0 = 25, and the tabled critical one-tailed .05 and .01 values are U = 25 a€" 4 = 21 and U = 25 a€" 1
0 5 0 1

= 24. In order to be significant, the obtained value of U (i.e., when the larger of the two values U versus U 1 2

represents U) must be equal to or greater than the tabled critical value at the prespecified level o f
significance. Note that since the obtained value U = 21 is equal to the tabled critical one-tailed .05 value U = .05

21 (but less than the other three above noted critical values), as was the case when U = 4 was employed as the
2

U statistic, the null hypothesis can only be rejected i f the directional alternative hypothesis H1: I , 1 < I,2 is
employed; d) The table for the alternative version o f the Manna€"Whitney U test (which was developed by
Wilcoxon (1949)) contains critical values that are based on the sampling distribution o f the sums o f ranks,
which differ from the tabled critical values contained in Table A11 (which represents the sampling distribution
o f U values).
3. Although for Example 12.1 we can also say that since I£R < I£R the data are consistent with the directional
1 2

alternative hypothesis H : I , < I , , the latter will not necessarily always be the case when n a9 n . Since the
1 1 2 1 2

relationship between the average o f the ranks will always be applicable to both equal and unequal sample sizes,
it will be employed in describing the hypothesized relationship between the ranks o f the two groups.
4.Some sources employ an alternative normal approximation equation which yields the same result as Equation
12.4. The alternative equation is noted below.
5.A general discussion o f the correction for continuity can be found in Section V I o f the Wilcoxon signed-ranks
test.
6.Some sources employ the term below for the denominator of Equation 12.6. I t yields the identical result.
7. A correction for continuity can be used by subtracting .5 from the absolute value computed for the numerator of
Equation 12.6. The continuity correction yields the absolute value z = 8/4.71 = 1.70 (which in actuality is a
negative number since U = 4 is less than U = 8).
E

8. The rationale for discussing computer-intensive procedures in the Addendum o f the Manna€"Whitney U test
is that the Manna€"Whitney test (as well as many other rank-order procedures) can be conceptualized as an
example of a randomization or permutation test, which is the first of the computer-based procedures to be
described in the Addendum.
9. Another application of a computer-intensive procedure is Monte Carlo research, which is discussed in Section
V I I o f the single-sample runs test.
10. The term bootstrap is derived from the saying that one lifts oneself up by onea€™s bootstraps. Within the
framework o f the statistical procedure, bootstrapping indicates a single sample is used as a basis for generating
multiple additional samples a€" in other words, one makes the most out of what little resources one has. Manly
(1997) notes that the use o f the term jackknife is based on the idea that a jackknife is a multipurpose tool
which can be used for many tasks, in spite o f the fact that for any single task it is seldom the best tool.
11. The reader should take note o f the fact that although Example 12.3 involves two independent samples, by
applying the basic methodology to be described in this section, randomization tests can be employed to
evaluate virtually any type o f experimental design.
12. a) Suppose in the above example we have an unequal number o f subjects in each group. Specifically, let us
assume two subjects are randomly assigned to Group 1 and four subjects to Group 2. The total number o f
possible arrangements will be the combination o f six things taken two at a time, which is equivalent to the
combination o f six things taken four at a time. This results in 15 different arrangements:
(62)=(64)=6!2!a€99o4!=15; b) To illustrate how large the total number of arrangements can become, suppose
we have a total o f 40 subjects and randomly assign 15 subjects to Group 1 and 25 subjects to Group 2. The
total number o f possible arrangements is the combination o f 40 things taken 15 at a time, which is equivalent to
the combination o f 40 things taken 25 at a time. The total number o f possible arrangements w i l l be
(4015)=(4025)=40!15!a€99o25!=40,225,345,060. Obviously, without the aid of a computer it w i l l be impossible
to evaluate such a large number of arrangements; b) I n actuality, Equations 1.48 and 9.4 are the equations for
computing the number of combinations of n things taken x at a time. Although the test discussed in this
section is referred to as a permutation test, it is actually based on the computation o f combinations rather
than permutations. As noted in the Introduction, in computing combinations one is not interested in the
ordering of the elements within an arrangement, whereas in computing permutations one is interested in the
ordering.
13. The reader should take note o f the following: a) I f the 20 corresponding arrangements for Group 2 are listed in
Table 12.5, the same 20 arrangements which are listed for Group 1 will appear in the table, but in different
rows. To illustrate, the first arrangement in Table 12.5 for Group 1 is comprised o f the scores 7, 10, 11. The
corresponding Group 2 arrangement is 15, 18, 21 (which are the three remaining scores in the sample). I n the
last row of Table 12.5 the scores 15, 18, 21 are listed. The corresponding arrangement for Group 2 w i l l be the
remaining scores, which are 7, 10, 11. I f we continue this process for the remaining 18 Group 1 arrangements,
the final distribution o f arrangements for Group 2 w i l l be comprised o f the same 20 arrangements obtained for
Group 1 ; b) When n a9 n , the distribution o f the arrangements o f the scores in the two groups will not be
1 2

identical, since all the arrangements in Group 1 w i l l always have n scores and all the arrangements in Group 2
1

w i l l always have n scores, and n a9 n . Nevertheless, computation o f the appropriate sampling distribution
2 1 2

for the data only requires that a distribution be computed which is based on the arrangements for one o f the two
groups. Employing the distribution for the other group will yield the identical result for the analysis.
14. a) Although the result obtained with the Manna€"Whitney U test is equivalent to the result which w i l l be
obtained with a randomization/permutation test conducted on the rank-orders, only the version o f the test
which was developed by Wilcoxon (1949) directly evaluates the permutations of the ranks. Marascuilo and
McSweeney (1977, pp. 270a€"272) and Sprent (1998, pp. 85a€"86) note that the version of the test described
by Mann and Whitney (1947) actually employs a statistical model which evaluates the number o f inversions in
the data. A n inversion is defined as follows: Assume that we begin with the assumption that all the scores in
one group (designated Group 1) are higher than all the scores in the other group (designated Group 2). I f we
compare all the scores in Group 1 with all the scores in Group 2, an inversion is any instance in which a score
in Group 1 is not higher than a score in Group 2. It turns out an inversion-based model yields a result which is
equivalent to that obtained when the permutations of the ranks are evaluated (as is done in Wilcoxona€™s
(1949) version of the Manna€"Whitney U test). Employing the data from Example 12.1, consider Table
12.21 (which is identical to Table 12.3) where the scores in Group 1 are arranged ordinally in the top row, and
the scores in Group 2 are arranged ordinally in the left column.
15. Although in this example the identical probability is obtained for the highest sum o f scores and highest sum o f
ranks, this will not always be the case.
16. Efron and Tibshirani (1993, p. 394) note that the bootstrap differs from more conventional simulation
procedures (discussed briefly in Section V I I o f the single-sample runs test), in that in conventional
simulation, data are generated through use o f a theoretical model (such as sampling from a theoretical
population such as a normal distribution for which the mean and standard deviation have been specified). In the
bootstrap the simulation is data- based. Specifically, multiple samples are drawn from a sample o f data which
is derived in an experiment. One problem with the bootstrap is that since it involves drawing random
subsamples from a set o f data, two or more researchers conducting a bootstrap may not reach the same
conclusions due to differences in the random subsamples generated.
17. a) Sprent (1998) notes that when the bootstrap is employed correctly in situations where the normality
assumption is not violated, it generally yields conclusions which w i l l be consistent with those derived from the
use o f conventional parametric and nonparametric tests, as well as permutation tests; b) Wilcox (2003)
describes a procedure developed by Yuen (1974) which can be employed for comparing trimmed means.
18. The exponential distribution (which is discussed in Section IX (the Addendum) of the binomial sign test for
a single sample) is a continuous probability distribution that is often useful in investigating reliability theory
and stochastic processes. Manly (1997) recommends that prior to employing the bootstrap for inferential
purposes, it is essential to evaluate its performance with small samples derived from various theoretical
probability distributions (such as the exponential distribution). Monte Carlo studies (i.e., computer simulations
involving the derivation o f samples from theoretical distributions for which the values o f the relevant
parameters have been specified) can be employed to evaluate the reliability o f the bootstrap.
19. a) Huber (1981) notes that a statistic is defined as robust i f it is efficient a€" specifically, i f the variance of the
statistic is not dramatically increased in situations where the assumptions for the underlying population are
violated. As an example, whereas the variance o f the sample mean is sensitive to deviations from the normality
assumption, the variance o f the sample median is relatively unaffected by departures from normality.
20. a) In contrast to a survival function, Rosner (1995, p. 609) notes that a hazard function (which documents
hazard rates) indicates the a€oeinstantaneous probability of having an event at time t given that one has
survived up to time ta€D a€" i.e., a hazard function is a mathematical relationship describing changes in the
risk that an event will occur over time. Selvin (1995, p. 437) states that a hazard function a€oemeasures the rate
of change in a survival function at time t relative to the probability of surviving beyond time t.a€' A large
hazard indicates events occur at a fast rate, while a small hazard indicates a slower rate o f occurrence. Wright
(2000, p. 380) notes that unlike a survival function, a hazard function can increase, decrease, or fluctuate over
time. As an example, the latter author notes the hazard function for human mortality is U-shaped, since the
probability o f a person dying is highest among newborns and the elderly and lower during the intervening
years. On the other hand, the survival function describing human mortality ranges from 1 down to 0, with the
probability o f survival decreasing with the progression o f time. Cox regression (Cox (1972) and Cox and
Oates (1984)) (which is described in many books on biostatistics) is a procedure commonly employed to
evaluate hazard functions. Among others, Wright (2000, p. 398) notes that the latter procedure is commonly
classified as a semiparametric (or partially) parametric procedure. Semiparametric procedures or models
are characterized by having multiple components, some o f which are parametric and others nonparametric, and
consequently such procedures or models do not clearly fall within the realm o f being parametric or
nonparametric (Sprent and Smeeton (2000)). NoruAjis (2004) is an excellent reference on Cox regression, as
well as on the use of SPSS for conducting a survival analysis.
21 .As noted in the Introduction, it is also true that P(AB) = P(B)P(A/B).
22.a) Research by Keller-McNulty and Higgins (1987) (summarized in Higgins (2004, pp. 63a€"64)) indicates
that although the default number of simulations implemented by StatXact (which is a statistical software
package commonly utilized in conducting permutation tests) is 10,000, in actuality, in order to obtain a
reliable result it is not necessary to conduct more than 1600 simulations; b) Most contemporary high speed
computers could compute the complete sampling distribution (i.e., all 184, 766 possible combinations) in a
relatively short period o f time.
23.If the value U = 6AV2 is substituted in Equation 12.4 (the equation for the normal approximation of the
1

Manna€"Whitney U statistic), it yields the absolute value z = .43. The continuity-corrected Equation 12.5
yields the absolute value z = .29. Neither of the latter values achieves statistical significance at the .05 level.
24.The log-rank test (also spelled in many sources as logrank test) can also be employed to contrast two survival
functions in which no data are censored.
25. Clarification with respect to determining a one-tailed chi-square critical value can be found in Section V o f the
single-sample chi-square test for a population variance.
26. Although most sources do not employ a correction for continuity for the log-rank test, the latter can be applied
to Equations 12.15 and 12.16, resulting in Equations 12.21 and 12.22. As noted below, use of the latter
equations w i l l result in a slightly lower absolute value for z and IJM2.
M

27.It should be noted that Gehana€™s test for censored data cannot be employed to evaluate Example
12.10/Table 12.19, since the available information does not allow one to determine the number of patients who
died versus had their scores censored during the four time periods (which one must know in order to conduct
the latter test).
28.Equation 16.45, which employs a correction for continuity, will yield a result that is equivalent to that obtained
with Equations 12.21 and 12.22.
1. Marasucilo and McSweeney (1977) employ a modified protocol which can result in a larger absolute value for
M in Column E than the one obtained in Table 13.1. The latter protocol employs a separate row for the score
o f each subject when the same score occurs more than once within a group. I f the latter protocol is employed in
Table 13.1, the first two rows of the table w i l l have the score of 0 in Column A for the two subjects in Group 1
who obtain that score. The first 0 will be in the first row, and have a cumulative proportion in Column B o f 1/5
= .20. The second 0 will be in the second row, and have a cumulative proportion in Column B o f 2/5 = .40. I n
the same respect the first of the two scores o f 11 (obtained by two subjects in Group 2) w i l l be in a separate
row in Column C , and have a cumulative proportion in Column D of 4/5 = .80. The second score of 11 will be
in the last row o f the table, and have a cumulative proportion in Column D o f 5/5 = 1. In the case o f Example
13.1, the outcome o f the analysis will not be affected i f the aforementioned protocol is employed. In some
instances, however, it can result in a larger M value. The protocol employed by Marasucilo and McSweeney
(1977) is used by sources who argue that when there are ties present in the data (i.e., the same score occurs
more than once within a group), the protocol described in this chapter (which is used in most sources) results in
an overly conservative test (i.e., makes it more difficult to reject a false null hypothesis).
2. When the values of n and n are small, some o f the .05 and .01 critical values listed in Table A23 are identical
1 2

to one another.
3. The last row in Table A23 can also be employed to compute a critical M value for large sample sizes.
1 .a) As is the case with the Manna€"Whitney U test, i f the reverse ranking protocol is employed, the values of
U and U are reversed (i.e., U becomes U and U becomes U ). Since, by virtue of the latter, the value of U,
1 2 1 2 2 1

which represents the test statistic, is the lower of the two values U versus U , the value designated U w i l l be
1 2

the same U value obtained with the original ranking protocol; b) If, on the other hand, the reverse ranking
protocol is employed and the values of U and U are not reversed, the researcher must designate the larger of
1 2

the two values U versus U as the value o f U. The protocol for interpreting the U value under the latter
1 2

circumstances is described in Endnote 2 o f the Manna€"Whitney U test.


2. As is the case with the Manna€"Whitney U test, in describing the Siegela€"Tukey test for equal variability
some sources do not compute a U value, but rather provide tables which are based on the smaller and/or larger
of the two sums of ranks. The equation for the normal approximation (to be discussed in Section VI) in these
sources is also based on the sums o f the ranks.
3. As previously noted, we can instead add three points to each score in Group 1.
4.If one employs Equation 11.7, and thus uses Table A10, the same tabled critical values are listed for F 9 7 5and
F9 9 5 for d f = 6 a€" 1 =5 and d f = 6 a€" 1 = 5. Thus, F
num d e n = 7.15 and F
9 7 5 9 9 5= 14.94. (The latter value is only
rounded off to one decimal place in Table A9.) The use of Table A10 in evaluating homogeneity of variance is
discussed in Section V I o f the t test for two independent samples.
5.The Levene test for homogeneity ofvariance (Test 21g) and the Brown-Forsythe test for homogeneity of
variance (Test 21 h) (both o f which are described in Section V I o f the single-factor between subjects
analysis of variance) represent two parametric tests which provide for an even more powerful test o f the
alternative hypothesis specifying heterogeneity of variance than Hartleya€™s F test for homogeneity of
max

variance/F test for two population variances.


1 One could argue that the use o f random subsamples for the Moses test for equal variability allows one to
conceptualize the test within the framework o f the general category o f resampling procedures, which are
discussed in Section I X (the Addendum) of the Manna€"Whitney U test.
2 A typical table o f random numbers is a computer-generated series o f random digits which fall within the range 0
through 9. I f there are six scores in a group, the sequential appearance o f the digits 1 through 6 in the table can
be used to form subsamples. For example, let us assume that the following string o f digits appears i n a random
number table: 2352239455675900912937373949404. For Group 1, we w i l l form three subsamples with two
scores per subsample. Since the first digit which appears in the random number table is 2, the second score
listed for the group will be the first score assigned to Subsample 1. Since 3 is the next digit, the third score
listed becomes the second score in Subsample 1. Since 5 is the next digit, the fifth score listed becomes the first
score in Subsample 2. We ignore the next four digits (2, 2, 3 and 9) since: a) We have already selected the
second and third scores from the group; and b) The digit 9 indicates that we should select the ninth score. The
latter score, however, does not exist, since there are only six scores in the group. Since the next digit is 4, the
fourth score in the group becomes the second score in Subsample 2. By default, the two scores which remain in
the group (the first and sixth scores) w i l l comprise Subsample 3. Continuing with the string o f random
numbers, the procedure is then repeated for the n scores in Group 2, forming three more subsamples with each
2

subsample being comprised o f two scores.


3 The Levene test for homogeneity of variance (Test 21g) and the Brown-Forsythe test for homogeneity of
variance (Test 21h) (both o f which are described in Section V I o f the single-factor between subjects
analysis of variance) represent two parametric tests that provide for an even more powerful test o f the
alternative hypothesis specifying heterogeneity of variance than Hartleya€™s F m a x test for homogeneity of
variance/F test for two population variances.
4 Equation 12.6 (the tie-corrected Manna€"Whitney normal approximation equation) can be employed i f ties are
present in the data (i.e., there are one or more identical values for squared difference score sums).
5 For purposes o f illustration we will assume that the medians o f the populations the two groups represent are
equal (which is an assumption of the Siegela€"Tukey test for equal variability). In actuality, the sample
medians computed for Groups 1 and 2 are, respectively, 7 and 8.
6 I n other words, when n1 a9 n2, the smaller sample size is employed when using Table A9 in order to minimize
the likelihood o f committing a Type I error.
1 A general discussion of the chi-square distribution can be found in Sections I and V of the single-sample chi-
square test for a population variance.
2 The use of the chi-square approximation (which employs a continuous probability distribution to approximate a
discrete probability distribution) is based on the fact that the computation o f exact probabilities requires an
excessive amount o f calculations.
3 In the case of both the chi-square test of independence and the chi-square test of homogeneity, the same
result will be obtained regardless o f which o f the variables is designated as the row variable versus the column
variable.
4 I t is just coincidental that the number o f introverts equals the number o f extroverts.
5 In the context of the discussion of the chi-square test of independence, the proportion of observations in Cell 11

refers to the number of observations in Cell divided by the total number of observations in the 2 A — 2 table.
11

In the discussion o f the hypothesis for the chi-square test for homogeneity, the proportion o f observations in
C e l l refers to the number o f observations in C e l l divided by the total number o f observations in Row 1 (i.e.,
11 11

the row in which C e l l appears).


11

6 Equation 16.2 is an extension of Equation 8.2 (which is employed to compute the value of chi-square for the chi-
square goodness-of-fit test) to a two-dimensional table. In Equation 16.2, the use of the two summation
expressions I£i=1rI£j=1c indicates that the operations summarized in Table 16.4 are applied to all o f the cells
in the r A — c table. I n contrast, the single summation expression I£i=1k in Equation 8.2 indicates that the
operations summarized in Table 8.2 are applied to all k cells in a one-dimensional table.
7 The same chi-square value w i l l be obtained i f the row and column variables are reversed a€" i.e., the helping
variable represents the row variable and the noise variable represents the column variable.
8 Correlational studies are discussed in the Introduction and in the chapter on the Pearson product-moment
correlation coefficient.
9 The value IJ.982=5.43 is determined by interpolation. I t can also be derived by squaring the tabled critical one-
th
tailed .01 value z.01 = 2.33, since the square o f the latter value is equivalent to the chi-square value at the 98
percentile. The use o f z values in reference to a 2 A — 2 contingency table is discussed later in this section
under the z test for two independent proportions.
2 th
10 Within the framework of this discussion, the value I J = 3.84 represents the tabled chi-square value at the 95
0 5

percentile (which demarcates the extreme 5% in the right tail of the chi-square distribution). Thus, using the
format employed for the one-tailed .05 and .01 values, the notation identifying the two-tailed .05 value can also
be written asIJ.952=3.84. In the same respect, the value IJ.012=6.63 represents the tabled chi-square value at
th
the 99 percentile (which demarcates the extreme 1% in the right tail o f the chi-square distribution). Thus,
using the format employed for the one-tailed .05 and .01 values, the notation identifying the two-tailed .01
value can also be written as IJ.992=6.63.
11 The null and alternative hypotheses presented for the Fisher exact test in this section are equivalent to the
alternative form for stating the null and alternative hypotheses for the chi-square test of homogeneity
presented in Section I I I ( i f the hypotheses in Section I I I are applied to a 2 A — 2 contingency table).
12 Sourcebooks documenting statistical tables (e.g., Owen (1962) and Beyer (1968)), as well as many books that
specialize in nonparametric statistics (e.g., Daniel (1990), Marascuilo and McSweeney (1977), and Siegel and
Castellan (1988)) contain tables o f the hypergeometric distribution which can be employed with 2 A — 2
contingency tables. Such tables eliminate the requirement of employing Equations 16.7 or 16.8 to compute the
value of P .
r

13 The value (1 a€" p ) , which is often represented by the notation q, can also be computed as follows: q = (1 a€"
p) = (b + d)/(n + n ) = (b + d)/n. The value q is a pooled estimate of the proportion of observations in Column
1 2

2 in the underlying population.


14 Due to rounding o f f error there may be a minimal discrepancy between the square o f a z value and the
corresponding chi-square value.
15 The logic for employing Equation 16.11 in lieu of Equation 16.9 is the same as that discussed in reference to
the t test for two independent samples, when in the case o f the latter test the null hypothesis stipulates a value
other than zero for the difference between the population means (and Equation 11.5 is employed to compute
the test statistic in lieu of Equation 11.1, 11.2, or 11.3).
16 The denominator of Equation 16.11 is employed to compute s " instead of the denominator of Equation
p1 a€ p 2

16.9, since in computing a confidence interval it cannot be assumed that I € = I € (which is assumed in
1 2

Equation 16.9, and serves as the basis for computing a pooled p value in the latter equation).
17 The median test for independent samples can also be employed within the framework o f the model for the
chi-square test of independence. To illustrate this, assume that Example 16.4 is modified so that the
researcher randomly selects a sample o f 200 subjects, and does not specify beforehand that the sample is
comprised o f 100 females and 100 males. I f it just happens by chance that the sample is comprised o f 100
females and 100 males, one can state that neither the sum o f the rows nor the sum o f the columns is
predetermined by the researcher. As noted in Section I , when neither o f the marginal sums is predetermined,
the design conforms to the model for the chi-square test of independence.
18 The word column can be interchanged with the word row in the definition o f a complex comparison.
19 Another consideration which should be mentioned with respect to conducting comparisons is that two or more
comparisons for a set o f data can be orthogonal (which means they are independent o f one another), or
comparisons can overlap with respect to the information they provide. As a general rule, when a limited
number o f comparisons is planned, it is most efficient to conduct orthogonal comparisons. The general subject
of orthogonal comparisons is discussed in greater detail in Section V I of the single-factor between-subjects
analysis of variance.
20 The null and alternative hypotheses stated below do not apply to the odds ratio.
21 a) In some sources (and in the previous edition o f this book) the phi coefficient is represented by the symbol
I f ; b) Some sources note that the phi coefficient can only assume a range of values between 0 and +1. I n these
sources, the term |ad a€" bc| is employed in the numerator of Equation 16.20. By employing the absolute value
of the term in the numerator of Equation 16.20, the value of phi w i l l always be a positive number. Under the
latter condition the following will be true: I'=IJ2/n;c) The author is indebted to Sean Wallis for alerting him to
the fact that the absolute value of phi w i l l equal 1 i f all of the observations in 2 A — 2 table fall in one of the
diagonals o f the table.
22 In the case o f small sample sizes, the results o f the Fisher exact test are employed as the criterion for
determining whether or not the computed value of phi is significant.
23 Table 16.50 (extracted from Table 7.3.15 in Cohen (1988, p. 235)) provides representative power values
associated with !• values of .10 (small effect size), .30 (medium effect size), and .50 (large effect size). The
table assumes a 2 A — 2 contingency table and is evaluated with I± = .05. To illustrate the use of Table 16.50,
i f a researcher is interested in detecting a medium effect size (i.e., !• = .30) and employs a sample size o f 25,
the power associated with the analysis w i l l be .32. Comprehensive tables containing analogous information for
use with the 2 A — 2 tables as well as larger contingency tables (for which > discussed in the next section is
C

computed) that are applicable for I± = .05 and .01 can be found in Cohen (1988, pp. 228a€"248).
24 The reason why the result of the chi-square test for r A — c tables is not employed to assess the significance
of Q is because Q is not a function of chi-square. I t should be noted that since Q is a special case of Goodman
and Kruskala€™s gamma, it can be argued that Equation 32.2 (the significance test for gamma) can be
employed to assess whether or not Q is significant. However, Ott et al. (1992) state that a different procedure is
employed for evaluating the significance o f Q versus gamma. Equation 32.2 w i l l not yield the same result as
that obtained with Equation 16.24 when it is applied to a 2 A — 2 table. I f the gamma statistic is computed for
3 3
Examples 16.1/16.2 it yields the absolute value I = .56 ( I is the lower case Greek letter gamma), which is
identical to the value o f Q computed for the same set o f data. (The absolute value is employed since the
contingency table is not ordered, and thus, depending upon how the cells are arranged, a value o f either +.56 or
a"'.56 can be derived for gamma.) However, when Equation 32.2 is employed to assess the significance of P =
.56, it yields the absolute value z = 3.51, which although significant at both the .05 and .01 levels is lower than
the absolute value z = 5.46 obtained with Equation 16.24.
25 A more detailed discussion o f odds can be found in Section I X (the Addendum) o f the binomial sign test for
a single sample under the discussion o f Bayesian hypothesis testing.
26 a) As noted earlier, when odds are employed what is often stated are the odds that an event w i l l not occur.
Using this definition, the odds that a person in the noise condition did not help the confederate (or that
someone who washes her hands does not contract the disease) are 2.33:1 (since (70/100)/(30/100) = 2.33).
The odds that a person in the no noise condition did not help the confederate (or that someone who does not
wash her hands does not contract the disease) are .667:1 or 2:3 (since (40/100)/(60/100) = .667). These
2
values yield the same odds ratio, since 2.33/.667 = 3.49; b) It is also the case that o = 1/oa€ = 1/.29 = 3.45 and
2
oa€ = 1/o = 1/3.5 = .286 (the minimal differences are due to rounding off error); c) van Belle (2002, pp.
78a€"82) notes the following with respect to the odds ratio and the relative risk (both of which are discussed
in Section V I o f the chi-square test for r A — c tables): 1) The value o f the odds ratio is always further
removed from the value 1 than the value of the relative risk; 2) When the baserate is .05 or less, the difference
between the odds ratio and the relative risk w i l l be minimal; 3) The relative risk can never be greater than
1/baserate; 4) When both the odds ratio and the relative risk are less than 1, the difference between the two
values will be less pronounced; d) Endnotes 41 and 42 in the chapter on the Pearson product-moment
correlation coefficient provide additional discussion on the relationship between the odds ratio and relative
risk.
27 Equations 16.69 and 16.70 are alternate equations for computing the odds ratios o = 3.5 and oa€ = .29.
28 Pagano and Gauvreau (1993) note that i f the expected frequencies for any of the cells in the contingency table
are less than 5, the equation below should be employed to compute the standard error.
29 I f the continuity-corrected equation (Equation 16.10) is employed for the examples in this section which
evaluate whether or not there is a difference between two independent proportions, a lower absolute z value
w i l l be computed for any analysis that is conducted. The lower absolute z value w i l l not, however, affect the
conclusions reached for any analysis. Later in this section Equation 16.14 is employed to compute a confidence
interval for a difference between two proportions. I f in lieu o f Equation 16.14 the continuity-corrected
equation (Equation 16.15) is employed, a slightly wider confidence is computed. (In the relevant example, .01
is subtracted from the lower limit of a confidence interval and .01 is added to the upper limit of a confidence
interval.) The latter, however, does not affect the conclusions reached for the analysis.
30 a) I f the correction for continuity is used to compute the second z value, z = a"'1.44 is obtained. The latter
illustrates that i f the correction for continuity is employed, the absolute value stipulated for I f must be greater
than .154, which is the upper limit of the continuity corrected confidence interval. Employing I f = .154, the
continuity-corrected equation becomes z = [(.837 a"'.784) a"'.154 + .01] / .0552 = a"'1.65; b) Technically, i f
the statement is made that the difference between the proportion o f successes between the two groups is within
a proportion value of .144, the alternative hypotheses should be written as follows: H : I , a99o¥ a"'. 144 and H :
1 1

I , a99oa . 144. Since the equals sign should not be included in the null hypotheses, the latter should be written as
H : I , < a"'.144 or H : I , > .144.
0 0

31 Rosner (2000, p. 640) notes that if, in fact, there is no difference between the sample proportions (i.e., p = p =
1 2

p and we let q = p a€" 1) and k = 1, Equation 16.51 becomes Equation 16.71. If, for purposes of illustration, we
assume that p = p = a"'837, under the latter circumstances in order for the power of the test to be .80 the
1 2

number o f subjects required per group is reduced to n = 214.


2 2
32 Yatesa€™ correction for continuity was not used to compute the values I J = 11.58 and I J = 2.09 for the
two hospitals. I f Yatesa€™ correction is used, the computed chi-square values will be a little lower.
33 Some sources (e.g., Christensen (1990)) employ the term factors (which is the term that is commonly
employed within the framework o f a factorial analysis of variance) to identify the different independent
variables.
34 Zar (1999) notes that the degrees of freedom are the sum of the degrees of freedom for all of the interactions.
Specifically df = rcl a€" r a€" c a€" 1 + 2 = (r a€" 1 )(c a€" 1)(/ a€" 1) + (r a€" 1 )(c a€" 1) + (r a€" 1)(/ a€" 1) +
(c a€" a€" 1).
35 Zar (1999) notes that the degrees of freedom are the sum of the following: df = rcl a€" cl a€" r + 1 = (r a€" 1)(c
a€" a€" 1) + (r a€" 1)(c a€" 1) + (r a€" a€" 1).
36 Based on Endnote 35, it logically follows that the degrees o f freedom are the sum o f the following: df = rcl-rl-c
+ 1= (r a€" 1 )(c a€" a€" 1) + (c a€" l)(r a€" 1) + (c a€" a€" 1).
37 Based on Endnote 35, it logically follows that the degrees of freedom are the sum o f the following: d f = rcl a€"
rc a€" 1 + 1 = (r a€" 1)(c a€" a€" 1) + (r a€" l)(r a€" 1) + (l a€" l)(c a€" 1).
38 Zar (1999) discusses and cites references on mosaic displays for contingency tables, which represent an
alternative to conventional graphs for visually summarizing the data in a contingency table.
39 a) Readers who are not familiar with factorial designs and the concepts o f main effects and interactions are
advised to read Sections I-V of the between-subjects factorial analysis of variance (Test 27) prior to reading
the discussion o f log-linear analysis; b) Tabachnick and Fidell (1996, 2001) refer to log-linear analysis as
multiway frequency analysis.
40 The following SPSS command sequence was employed in this chapter in conducting a log-linear analysis: a)
Click Analyze; b) Click Loglinear; c) Click Model Selection; d) Highlight the categorization variables (i.e.,
the row, column, and layer variables in Examples 16.9 and 16.10) one at a time, and move each one
separately into the Factor(s) window. After moving a categorization variable into the Factor(s) window, click
Define Range and enter 1 for Minimum and enter the number of categories for that variable for Maximum
(which will be 2 for all three categorization variables employed in Examples 16.9 and 16.10), and then click
Continue (After entering the categorization variables and defining the range for each one, take note o f the fact
that under Model Building the default setting is Use backward elimination, which was the option employed
for Examples 16.9 and 16.10. Additionally, i f you Click on Model, under Specify Model, note that Saturated
(which is the default setting) is checked, and then click Continue to return to the main window); e) Click
Options and check off desired information (e.g., Parameter estimates, Association Table), and then click
Continue; f) Click O K to obtain the output for the analysis.
41 a) Reynolds (1977b, p. 66) notes that Goodman (1970), who pioneered log-linear analysis, recommended .5 be
added to each of the observed frequencies in order to avoid having to compute a logarithm for zero (which is
undefined); b) As is the case for the chi-square test for r A — c tables, one assumption underlying log-linear
analysis is that the expected frequencies of all cells are greater than 1, and that no more than 20% of the cells
have an expected frequency less than 5. Reynolds (1977b, p. 78) notes that, as a general rule, the sample size
for a log-linear analysis will be adequate i f the result o f dividing the number o f subjects in a contingency table
by the total number o f cells which comprise the table is greater than five; c) As an alternative to employing
backward hierarchical analysis described in this section, it is possible for the researcher to stipulate one or
more specific models she wants to evaluate in relation to the saturated model; d) The total degrees o f freedom
for the saturated model under discussion where r = 2, c = 2, and l = 2 are df = 8. The latter value is computed
as follows: d fTotal= [(df = 1) + (df = r a€" 1 = 1)+(df = c a€" 1 = 1) + ( d f = l a€" 1 = 1) + (df = (r a€" 1)(c
r c rc

a€" 1) = 1) + (df = (r a€" 1) ( l a€" 1) = 1) + (df = (c a€" 1 )(l a€" 1) = 1) + (df = (r a€" 1 )(c a€" 1 )(l a€" 1) =
cl rcl

1)] = 8 (Christensen (1990, p. 87).


42 In the case o f a three-way table, a model involving two-way interactions is said to be nested within the
saturated model. I n other words a less complex model w i l l always be nested within a more complex model.
Consequently, a model involving main effects will be nested within both the saturated model and a model
involving two-way interactions.
43 NoruAjis (2004, p. 13) notes that the Pearson Chisq values and associated probabilities in Table 16.39 are
based on evaluating the data with the chi-square test for r A — c tables. For large sample sizes the latter test
statistic is equivalent to the likelihood ratio chi-square statistic. The reason the analysis focuses on the
likelihood ratio statistic is because it can be subdivided into components, each of which can be interpreted,
and the sum o f the components w i l l add up to the total value o f the statistic.
44 Unlike in most analyses where a researcher typically wishes to reject the null hypothesis, and is thus looking to
obtain ap value equal to or less than .05, within the context of the method being described for interpreting the
value o f a likelihood ratio statistic, the researcher wants to retain the null hypothesis that an alternative model
is not significantly inferior to the saturated model, and in order to do the latter ap value above .05 is required.
45 a) With respect to the tables labeled Tests that K-way and higher-order effects are zero and Tests that K -
way effects are zero, Tabachnick and Fidell (1996, p. 280) suggest that in the case of a small sample size in
order to increase the power of an analysis, a researcher might employ a p value larger than .05 as a criterion for
significance; b) The reader need not be concerned with the values in the columns labeled Iteration in the tables
labeled Tests that K-way and higher-order effects are zero and Tests that K-way effects are zero. A n
iteration is a set o f operations which is sequentially repeated until the best possible approximation is computed
for a value in question. I n the case of log-linear analysis multiple calculations are required in order to compute
the best possible estimate for the expected cell frequencies. The values in the Iteration column just indicate the
number o f calculations required for each model.
46 a) NoruAjis (2004, p. 14) notes that the generating class of a model refers to the highest- order interaction
involved in a model. Thus, i f the generating class for a model is the second-order interactions r A — 1 and c
A — 1, the model will contain the latter interactions plus all of the lower-order aCoerelativesaC' of those latter
interactions (i.e., the main effects r, c, and l, which are components o f the aforementioned second-order
interactions); b) NoruAjis (2006, personal communication) notes that it is possible for the final model
(displayed under the heading The final model has generating class at the bottom o f page 2 o f Table 16.39) to
contain a higher-order interaction than what is indicated by the results for the tables labeled Tests that K-way
and higher order effects are zero and Tests that K-way effects are zero. She notes that the tests in the latter
tables a€oeare testing sequential hypotheses about models with all interaction terms of a particular order absent
or present. When you remove the restriction that all interaction terms of a particular order must be present or
absent and require only that resulting models be hierarchical, the significance o f individual effects may
change.aC' Thus, although it was not the case for the example under discussion, it is theoretically possible that
the final model could have contained a three-way interaction; c) In point of fact, the magnitude of difference
between the probabilities associated with the computed L . R . Chisq Change values for the R O W V A R *
L A Y E R V A R and C O L V A R * L A Y E R V A R interactions is trivial and could easily be attributable to sampling
error. Consequently, before drawing any conclusions regarding the relative importance of the two interactions
the researcher would be required to conduct further studies on other samples derived from the same population;
d) The author is indebted to Marija NoruAjis for clarifying some of the information displayed by SPSS for log-
linear analysis.
47 Within the context of discussing log-linear analysis, sources (e.g., Christensen (1990), Garson (2006), and
Selvin (1995)) employ the following terminology with respect to identifying models in reference to a three-
dimensional contingency table: a) A model in which all o f the two-way interactions are significant is referred
to as the homogeneous association model; b) A model in which two o f the three possible two-way
interactions are significant is referred to as a conditional independence model; c) A model in which one o f
the three possible two-way interactions is significant is referred to as a partial independence model or one-
factor independence model; and d) A model in which all o f the factors are unrelated to one another is referred
to as a complete independence model or equiprobability model. Other sources, however, may employ
different terminology in reference to one or more o f the latter models (e.g., Howell (2002, p. 665)).
48 Since Equations 9.6 and 9.7 are equivalent, either one can be employed for the analysis. I n addition, since our
analysis involves a binary situation (i.e., the two response categories of being correct or incorrect), the data
can also be evaluated with the chi-square goodness-of-fit test. The chi-square value obtained with the latter
test will be equal to the square of the z value obtained with Equations 9.6 and 9.7.
49 In the interest o f precision, the number o f days in each sun sign is not one-twelfth o f the total number o f days in
a year (since 365 cannot be divided evenly by 12, it logically follows that the number o f days in each sun sign
w i l l not be equal). In spite o f the latter, for all practical purposes the value 1/12 can be accurately employed to
represent the probability for each sun sign.
1 .a) Alternative terms which are employed in describing a dependent samples design and the f-test described in
this chapter for such a design are repeated measures design/f-test, within-subjectsdesign/f-test, paired-
samples designs-test, treatment-by-subjects design/f-test, correlated samples designs-test, matched-
subjects design/f-test, matched-pairs design/f-test, crossover design, randomized-blocks design, and split-
plot design. The use o f the terms blocks within the framework o f a dependent samples design is discussed in
Endnote 1 o f the single-factor within-subjects analysis of variance; b) A dependent samples design (in
which there are two or more experimental conditions) is considered a balanced design i f there is no missing
data a€" i.e., a score is obtained from each subject for each of the experimental conditions. In the event at least
one subject is missing data for one or more o f the experimental conditions a dependent samples design would
be categorized as an unbalanced design.
2.As noted in the Introduction, a study has internal validity to the extent that observed differences between the
experimental conditions on the dependent variable can be unambiguously attributed to a manipulated
independent variable. Random assignment o f subjects to the different experimental conditions is the most
effective way to optimize the likelihood o f achieving internal validity (by eliminating the possible influence
o f confounding/extraneous variables). I n contrast to internal validity, external validity refers to the degree to
which the results o f an experiment can be generalized. The results o f an experiment can only be generalized to
a population o f subjects, as well as environmental conditions, which are comparable to those that are employed
in the experiment.
3.In actuality, when galvanic skin response (which is a measure of skin resistance) is measured, the higher a
subjecta€™s GSR the less emotional the subject. I n Example 17.1, it is assumed that the GSR scores have been
transformed so that the higher a subjecta€™s GSR score, the greater the level of emotionality.
4. A n alternative but equivalent way of writing the null hypothesis is H : I / / a€" I % = 0. The analogous
0 2

alternative but equivalent ways of writing the alternative hypotheses in the order they are presented are: I % 1

a€" I % a99o 0; H : I / a€" I % > 0; and H : I / a€" I % < 0.


2 1 2 1 2

5. Note that the basic structure of Equation 17.3 is the same as Equations and 2.1 (the equation for the
estimated population standard deviation which is employed within the framework of the single-sample f test).
In Equation 17.3 a standard deviation is computed for n D scores, whereas in Equations and 2.1 a standard
deviation is computed for n X scores.
6. The actual value that is estimated by sDA is I / D A , which is the standard deviation of the sampling
distribution o f mean difference scores for the two populations. The meaning o f the standard error of the
mean difference can be best understood by considering the following procedure for generating an empirical
sampling distribution o f difference scores: a) Obtain n difference scores for a random sample o f n subjects; b)
Compute the mean difference score (DA ) for the sample; and c) Repeat steps a) and b) m times. A t the
conclusion of this procedure one w i l l have obtained m mean difference scores. The standard error of the
mean difference represents the standard deviation of the m mean difference scores, and can be computed by
substituting the term D A " for D in Equation 17.3. Thus: sDA"=[ I£DA"2a"X(I£DA")2/m) ]/[ ma"'1 ] . The
standard deviation which is computed with Equation 17.4 is an estimate of I / D A .
7.In order for Equation 17.1 to be soluble, there must be variability in the n difference scores. I f each subject
produces the same difference score, the value of sEoeD computed with Equation 17.3 will equal 0. As a result
of the latter, Equation 17.4 w i l l yield the value sDA =0. Since sDA is the denominator of Equation 17.1,
when the latter value equals zero the f test equation will be insoluble.
8. The same result for the f test w i l l be obtained i f in obtaining a difference score for each subject a subjecta€™s
X score is subtracted from his X score (i.e., D = X a€" X ) . Employing the latter protocol w i l l only result in a
1 2 2 1

change in the sign o f the value computed for the t statistic. In the case o f Example 17.1 the aforementioned
protocol will yield the value t = a"'2.86. The obtained value of t is interpreted in the same manner as noted at
the beginning o f this section, except for the fact that in order for the directional alternative hypothesis H : I /
1 1

> I / to be supported the sign o f t must be negative, and for the directional alternative hypothesis H : I / <
2 1 1

I / to be supported the sign o f t must be positive.


2

9. The numerator of Equation 17.6 w i l l always equal D A (i.e., the numerator of Equation 17.1). I n the same
respect the denominator of Equation 17.6 will always equal sDA (the denominator of Equation 17.1). The
denominator of Equation 17.6 can also be written as follows:
10. a) The reader should take note o f the fact that i f the correlation between the scores o f the n subjects under the
two experimental conditions is a low positive value, the f test for two dependent samples may actually
provide a less powerful test o f an alternative hypothesis than would be the case i f the same set o f data were
evaluated with a f test for two independent samples. The reason for the latter is that the degrees of freedom
employed for the f test for two independent samples are larger than the degrees of freedom employed for the
f test for two dependent samples. Use of a larger degrees of freedom value allows a researcher to employ a
smaller critical t value for the f test for two independent samples, which might offset any power advantage
resulting from the lower error term in the denominator of the f test for two dependent samples; b) Note that in
the case of Example 11.1 (which is employed to illustrate the f test for two independent samples), it is
reasonable to assume that scores in the same row of Table 11.1 (which summarizes the data for the study) will
not be correlated with one another (by virtue o f the fact that two independent samples are employed in the
study). When independent samples are employed, it is assumed that random factors determine the values o f any
pair o f scores in the same row o f a table summarizing the data, and consequently it is assumed that the
correlation between pairs o f scores in the same row will be equal to (or close to) 0.
11. a) Due to rounding off error, there may be a slight discrepancy between the value o f t computed with Equations
17.1 and 17.6; b) I f instead of (XA 1a"'XA 2),(XA 2a"'XA 1) is employed in the numerator of Equation
17.6, the obtained absolute value o f t will be identical but the sign o f t w i l l be reversed. Thus, in the case o f
Example 17.1 it w i l l yield the value t = a"'2.86.
12. A noncorrelational procedure which allows a researcher to evaluate whether or not a treatment effect is present
in the above described example is Fishera€™s randomization procedure (Fisher (1935)), which is generally
categorized as a permutation test. The randomization test for two independent samples (Test 12a), which
is an example of a test that is based on Fishera€™s randomization procedure, is described in Section I X (the
Addendum) of the Manna€"Whitney U test (Test 12). Fishera€™s randomization procedure requires that
all possible score configurations which can be obtained for the value of the computed sum o f the difference
scores be determined. Upon computing the latter information, one can determine the likelihood of obtaining a
configuration o f scores which is equal to or more extreme than the one obtained for a set o f data.
13. Equation 17.1 can be modified as follows to be equivalent to Equation 17.8: t=[ D A a"'(I%1a"'I%2) ]/sDA .
14.Although Equation 17.15 is intended for use prior to collecting the data, it should yield the same value for I / D

i f the values computed for the sample data are substituted in it. Thus, i f we employ the values I / = 2.60 (which
is the average o f the values sEoe1=2.83 and sEoe2=2.38) and !• X 1 X 2= .78 (which is the population correlation
coefficient estimated by the value r X1 = .78), and substitute them in Equation 17.15, the value I / = 1.72 is
X 2 D

computed, which is quite close to the computed value sEoD=1.78. The slight discrepancy between the two
values can be attributed to the fact that the estimated population standard deviations are not identical.
15.In contrast to the f test for two dependent samples (which can only be employed with two dependent
samples), the single-factor within-subjects analysis of variance can be used with a dependent samples design
involving interval/ratio data in which there are k samples, where k a99o¥ 2.
16.Note that the basic structure of Equation 17.20 is the same as Equation 11.17 (which is employed for
computing a confidence interval for the f test for two independent samples), except that the latter equation
employs sXA 1a"'XA 2 in place of sDA .
17.It was noted earlier that i f all n subjects obtain the identical difference score, Equations 17.1/17.6 become
unsolvable. In the case of Equation 17.21, for a given value of n, i f all n subjects obtain the same difference
score the same A value will always be computed, regardless o f the magnitude of the identical difference score
obtained by each of the n subjects. I f the value of A computed under such conditions is substituted in the
equation t=(na"'1)/(Ana"'1) (which is algebraically derived from Equation 17.22), the latter equation becomes
unsolvable (since the value (An a€" 1) w i l l always equal zero). The conclusion that results from this
observation is that Equation 17.21 is insensitive to the magnitude o f the difference between experimental
conditions when all subjects obtain the same difference score.
18. a) Equation 17.23 can also be written as follows:
19. van Belle (2002, pp. 61a€"63) notes that in order for matching to be of value, the correlation for the matched
pairs on the dependent variable (i.e., r ) should be .5 or greater.
X1X2

20. a) I n Example 17.1, the order of presentation of the conditions is controlled by randomly distributing the
sexually explicit and neutral words throughout the 16 word list presented to each subject; b) In some sources
the term order effects is employed synonymously with the terms carryover effects, or sequencing.
21.In the earlier editions o f this book the one-group pretest-posttest design was referred to as a before-after
design.
22.A doctor conducting such a study might justify the absence o f a control group on ethical grounds, based on
onea€™s belief that patients in such a group would be deprived of a potentially beneficial treatment.
23.In the previous edition of this book an alternative test of equivalence developed by Tryon (2001) was described
that yielded different results from those obtained with the Westlake-Schuirmann test. However, Tryon
(personal communication) subsequently identified an error in his methodology and determined that, in fact, his
procedure was equivalent to the Westlake-Schuirmann procedure.
1.Some sources note that one assumption o f the Wilcoxon matched-pairs signed-ranks test is that the variable
being measured is based on a continuous distribution. In practice, however, this assumption is often not
adhered to.
2. When there are tied scores for either the lowest or highest difference scores, as a result o f averaging the ordinal
positions of the tied scores, the rank assigned to the lowest difference score will be some value greater than 1,
and the rank assigned to the highest difference score w i l l be some value less than n.
3. A more thorough discussion of Table A5 can be found in Section V of the Wilcoxon signed- ranks test.
3
4. The term (I£t a€" I£t) in Equation 18.4 can also be written as I£i=1s(ti3a"'ti). The latter notation indicates the
following: a) For each set o f ties, the number o f ties in the set is subtracted from the cube o f the number o f ties
in that set; and b) the sum o f all the values computed in part a) is obtained. Thus, in the example under
discussion (in which there are s = 3 sets of ties):
5. A correction for continuity can be used in conjunction with the tie correction by subtracting .5 from the absolute
value computed for the numerator of Equation 18.4. Use of the correction for continuity will reduce the tie-
corrected absolute value o f z.
6.Sources are not in agreement with respect to the minimum sample size for which the latter equations should be
employed.
7.The concept o f power-efficiency is discussed in Section V I I o f the Wilcoxon signed-raks test.
1 Some sources note that one assumption o f the binomial sign test for two dependent samples is that the
variable being measured is based on a continuous distribution. In practice, however, this assumption is often
not adhered to.
2 Another way o f stating the null hypothesis is that in the underlying population the sample represents, the
proportion o f subjects who obtain a positive signed difference is equal to the proportion o f subjects who obtain
a negative signed difference. The null and alternative hypotheses can also be stated with respect to the
proportion o f people in the population who obtain a higher score in Condition 2 than Condition 1, thus yielding
a negative difference score. The notation I€- represents the proportion of the population who yield a difference
with a negative sign (referred to as a negative signed difference). Thus, H : I€- = .5 can be employed as the null
0

hypothesis, and the following nondirectional and directional alternative hypotheses can be employed: H : I€- 1

a99o .5; H : I€- > .5; H : I€- < .5.


1 1

3 I t is also the likelihood o f obtaining 8 or 9 negative signed differences in a set o f 9 signed differences.
4 Due to rounding off protocol, the value computed with Equation 19.1 w i l l be either .0195 or .0196, depending
upon whether one employs Table A6 or Table A7.
5 A n equivalent way o f determining whether or not the result is significant is by doubling the value o f the
cumulative probability obtained from Table A7. In order to reject the null hypothesis, the resulting value must
not be greater than the value of I±. Since 2 A — .0195 = .039 is less than I± = .05, we confirm that the
nondirectional alternative hypothesis is supported when I± = .05. Since .039 is greater than I± = .01, it is not
supported at the .01 level.
6 Equations 9.6 and 9.8 are, respectively, alternate but equivalent forms o f Equations 19.3 and 19.4. Note that in
Equations 9.6, 9.9, I € and I € are employed in place of I€+ and I€- to represent the two population
1 2

proportions.
7 A full discussion o f the protocol for determining one-tailed chi-square values can be found in Section V I I o f the
chi-square goodness-of-fit test.
8 Alternative equations for computing a confidence interval (e.g., Equations 8.7, 8.8, 8.9, 8.10, 8.11 and Equations
8.14 and 8.15) described in the chapter on the chi-square goodness-of-fit test will yield approximately the
same limits for a confidence interval as those obtained with Equation 19.5.
1 The distinction between a true experiment and a natural experiment is discussed in more detail in the
Introduction.
2 a) The reader should take note o f the following with respect to the null and alternative hypotheses stated in this
section:
3 I t can be demonstrated algebraically that Equation 20.1 is equivalent to Equation 8.2 (which is the equation for
the chi-square goodness-of-fit test). Specifically, i f Cells a and d are eliminated from the analysis, and the
chi-square goodness-of-fit test is employed to evaluate the observations in Cells b and c, n = b + c. I f the
expected probability for each of the cells is .5, Equation 8.2 reduces to Equation 20.1. As will be noted in
Section V I , a limitation of the McNemar test (which is apparent from inspection of Equation 20.1) is that it
only employs the data for two o f the four cells in the contingency table.
4 A general overview of the chi-square distribution and interpretation of the values listed in Table A4 can be
found in Sections I and V of the single-sample chi-square test for a population variance (Test 3).
5 The degrees of freedom are based on Equation 8.3, which is employed to compute the degrees of freedom for the
chi-square goodness-of-fit test. In the case of the McNemar test, d f = k a€" 1 = 2a€"1 = 1, since only the
observations in Cells b and c (i.e., k = 2 cells) are evaluated.
6 A full discussion o f the protocol for determining one-tailed chi-square values can be found in Section V I I o f the
chi-square goodness-of-fit test.
7 A general discussion o f the correction for continuity can be found in Section V I o f the Wilcoxon signed-ranks
test (Test 6). Fleiss (1981) notes that the correction for continuity for the McNemar test was recommended by
Edwards (1948).
8 The numerator of Equation 20.4 is sometimes written as (b a€" c) A± 1. I n using the latter format, 1 is added to
the numerator i f the term (b a€" c) results in a negative value, and 1 is subtracted from the numerator i f the
term (b a€" c) results in a positive value. Since we are only interested in the absolute value of z, it is simpler to
employ the numerator in Equation 20.4, which results in the same absolute value that is obtained when the
alternative form o f the numerator is employed. I f the alternative form o f the numerator is employed for
Examples 20.1/20.2, it yields the value z = a"'3.67.
9 Equations 20.18 and 20.19 are alternative equations which can be employed to compute the McNemar test
statistic. Whereas Equation 20.18 does not employ a correction for continuity, Equation 20.19 does. As noted
in Endnote 2, the values p versus p in the latter equations are computed as follows: p = (a + b)/n = (10 +
1 2 1

13)/100 = .23; p = (a + c)/n = (10 + 41)/100 = .51. Equations 20.18 and 20.19 are employed below to compute
2

the values z = a"'3.81 and z = 3.67 which are identical to the values previously computed with Equations 20.2
and 20.4.
10 The values .0287 and .0571 respectively represent the proportion o f the normal distribution which falls above
the values z = 1.90 and z = 1.58.
11 In point o f fact, it can also be viewed as identical to the analysis conducted with the binomial sign test for two
dependent samples. In Section V I I of the Cochran Q test, it is demonstrated that when the McNemar test (as
well as the Cochran Q test when k = 2) and the binomial sign test for two dependent samples are employed
to evaluate the same set o f data, they yield equivalent results.
12 For a comprehensive discussion o f the computation o f binomial probabilities and the use o f Table A7, the
reader should review Section I V o f the binomial sign test for a single sample.
13 a) The reader should take note o f the fact that i f the value (b+c)/n2 is employed to compute the standard error,
it results in the slightly larger value (13+41)/(100)2=.0735; b) There are a number of alternative but equivalent
equations for computing the value of the standard error computed with Equation 20.9 (which is also employed
in Selvin (1995, p. 265)). Equations 20.20 and 20.21 (Fleiss et al.(2003, p. 378)) and Equation 20.22
(Marascuilo and McSweeney (1977, pp. 170a€"171)) represent three alternative but equivalent equations for
computing the standard error.
14 a) Fleiss et al. (2003, p. 376) note that the standard error o f an odds ratio (s ) can be computed with
om

Equation 20.23 (which is analogous to the equation used to compute the standard error for an odds ratio
employed in Equation 16.29). For the data in Table 20.4 the value s = 1 is computed.
om

15 The methodology to be described in this section for evaluating equivalence is essentially the same as
methodologies described by L u and Bean (1995) and Morikawa and Yanagawa (1995) which are described in
Tango (1998).
16 I f the denominator of Equation 20.18 (i.e., the standard error which only takes into account the frequencies in
cells b and c) is employed as the denominator of Equation 20.15, the absolute value computed for z will be
slightly larger than the value computed with Equation 20.15. I n other words, use of the standard error which
ignores the frequencies for cells a and d will result in a slightly less conservative test o f equivalence.
17 Marascuilo and McSweeney (1977) note that it is only possible to state the alternative hypothesis directionally
when the number o f degrees o f freedom employed for the test is 1, which will always be the case for a 2 A — 2
table.
1 The term single-factor refers to the fact that the design for which the analysis o f variance is employed involves
a single independent variable. Since factor and independent variable mean the same thing, multifactor designs
(more commonly called factorial designs) which are evaluated with the analysis o f variance involve more than
one independent variable. Multifactor analysis o f variance procedures are discussed in the chapter on the
between-subjects factorial analysis of variance.
2 a) It should be noted that i f an experiment is confounded, one cannot conclude that a significant portion o f
between-groups variability is attributed to the independent variable. This is the case, since i f one or more
confounding variables systematically vary with the levels o f the independent variable, a significant difference
can be due to a confounding variable rather than the independent variable; b) The model for the single-factor
between-subjects analysis of variance can be summarized by Equation 21.86, which is a linear (or additive)
function o f the relevant parameters involved in the analysis o f variance. The latter equation describes the
elements which contribute to the score o f any subject on the dependent variable.
3 The homogeneity o f variance assumption is also discussed in Section V I of the f test for two independent
samples in reference to a design involving two independent samples.
4 Although it is possible to conduct a directional analysis, such an analysis w i l l not be described with respect to
the analysis of variance. A discussion of a directional analysis when k a€" 2 can be found under the f test for
two independent samples. In addition, a discussion of one-tailed F values can be found in Section V I of the
latter test under the discussion of Hartleya€™s F m a x test for homogeneity of variance/F test for two
population variances. A discussion of the evaluation of a directional alternative hypothesis when k a99oa 3 can
be found in Section V I I of the chi-square goodness-of-fit test (Test 8). Although the latter discussion is in
reference to analysis o f a A: independent samples design involving categorical data, the general principles
regarding the analysis of a directional alternative hypothesis when k a99o¥ 3 are applicable to the analysis of
variance.
5 Some sources present an alternative method for computing SS BG when the number of subjects in each group is
not equal. Whereas Equation 21.3 weighs each groupa€™s contribution based on the number of subjects in the
group, the alternative method (which is not generally recommended) weighs each groupa€™s contribution
equally, irrespective of sample size. Keppel (1991), who describes the latter method, notes that as a general
rule the value it computes for SS BG is close to the value obtained with Equation 21.3, except when the sample
sizes o f the groups differ substantially from one another.
6 Since there are an equal number o f subjects in each group, the Equation for SSBG can also be written as follows:
7 SSWG can also be computed with the following equation:
8 When n = n = n , MSWG=(sEoe12+sEoe22+sEoe32)/k. Thus, since sEoe12=.7, sEoe22=2.3, and sEoe32=2.7,
1 2 3

MS WG= (.7 + 2.3 + 2.7)/3 = 1.9.


9 Equation 21.9 can be employed i f there are an equal or unequal number of subjects in each group. The following
equation can also be employed when the number of subjects in each group is equal or unequal: df = (n a€" WG 1

1) + (n a€" 1) + a< + ( n a€" 1). The equation df = k(n a€" 1) = nk a€" k can be employed to compute df ,
2 k WG WG

but only when the number o f subjects in each group is equal.


10 When there are an equal number of subjects in each group, since N = nk, df = nk a€" 1. T

11 There is a separate F distribution for each combination of d f and d f values. Figure 21.5 depicts the F
n u m d e n

distribution for three different sets of degrees o f freedom values. Note that in each of the distributions, 5% of
the distribution falls to the right of the tabled critical F value. Most tables of the F distribution do not include
0 5

tabled critical values for all possible values o f df and df . The protocol which is generally employed for
num den

determining a critical F value for a df value that is not listed is to either employ interpolation or to employ the
df value closest to the desired df value. Some sources qualify the latter by stating that in order to insure that the
Type I error rate does not exceed the prespecified value o f alpha, one should employ the df value which is
closest to but not above the desired df value.
12 Although the discussion of comparison procedures in this section w i l l be limited to the analysis of variance, the
underlying general philosophy can be generalized to any inferential statistical test for which comparisons are
conducted.
13 1) The terms family and set are employed synonymously throughout this discussion; 2) When all o f the
comparisons in a family/set o f comparisons comprise the sum total o f the comparisons conducted for an
experiment, the error rate for the latter is commonly referred to as the experimentwise Type I error rate.
14 The accuracy of Equation 21.13 w i l l be compromised i f all of the comparisons are not independent of one
another. Independent comparisons, which are commonly referred to as orthogonal comparisons, are discussed
later in the section.
15 Equation 21.14 tends to overestimate the value of I ± . The degree to which it overestimates I ± increases as
F W F W

either the value of c or I ± increases. For larger values of c and I ± , Equation 21.14 is not very accurate.
P C PC

Howell (2002, p. 371) notes that the limits on I ± are I ± a99oa I ± a99oa (c)(I± ).
F W P C F W PC

16 Equation 21.16 provides a computationally quick approximation of the value computed with Equation 21.15.
The value computed with Equation 21.16 tends to underestimate the value of I ± . The larger the value of I ± PC F W

or c, the greater the degree I ± will be underestimated with the latter equation. When I± = .05, however, the
P C

two equations yield values which are almost identical.


17 This example is based on a discussion of this issue in Howell (2002) and Maxwell and Delaney (1990,2000).
18 a) Kline (2004, p. 14) defines a contrast as a€oea directional effect that corresponds to a particular facet o f the
omnibus effect.a€' Anderson (2001, p. 559) notes that a contrast can be defined as some trend (see the
discussion o f trend analysis later in this section for further clarification o f the latter term) which is
hypothesized with respect to the means o f the treatment conditions; b) The term single degree of freedom
comparison reflects the fact that k=2 means are contrasted with one another. Although one or both the k = 2
means may be a composite mean which is based on the combined scores o f two or more groups, any composite
mean is expressed as a single mean value. The latter is reflected in the fact that there w i l l always be one equals
sign (=) in the null hypothesis for a single degree o f freedom comparison.
19 Although the examples illustrating comparisons w i l l assume a nondirectional alternative hypothesis, the
alternative hypothesis can also be stated directionally. When the alternative hypothesis is stated directionally,
the tabled critical one-tailed F value must be employed. Specifically, when using the F distribution in
evaluating a directional alternative hypothesis, when I ± = .05, the tabled F value is employed for the one-
P C .90

tailed F .05value instead o f the tabled F value (which as noted earlier is employed for the two-
.05

tailed/nondirectional F .05 value). When I ± = .01, the tabled F


P C value is employed for the one-tailed F
.95 .01

value instead o f the tabled F.99 value (which is employed for the two-tailed/nondirectional F.01 value).
20 I f the coefficients of Groups 1 and 2 are reversed (i.e., c = a"'1 and c = +1), the value of F=a"' (cj)(XA j ) w i l l
1 2

equal a"'2.6. The fact that the sign of the latter value is negative w i l l not affect the test statistic for the
comparison. JT'his is the case, since, in computing the F value for the comparison, the value F=a"' (cj)(XA j )
is squared and, consequently, it becomes irrelevant whether F=a"' (cj)(XA j ) is a positive or negative number.
21 When the sample sizes o f all k groups are not equal, the value o f the harmonic mean (which is discussed in the
Introduction and in Section V I of the f test for two independent samples) is employed to represent n in
Equation 21.17. However, when the harmonic mean is employed, i f there are large discrepancies between the
sizes o f the samples, the accuracy o f the analysis may be compromised.
22 MS WG is employed as the estimate of error variability for the comparison, since, i f the homogeneity of variance
assumption is not violated, the pooled within-groups variability employed in computing the omnibus F value
w i l l provide the most accurate estimate of error (i.e., within-groups) variability.
23 As is the case with simple comparisons, the alternative hypothesis for a complex planned comparison can also
be evaluated nondirectionally.
24 A reciprocal o f a number is the value 1 divided by that number.
25 When there are an equal number o f subjects in each group it is possible, though rarely done, to assign different
coefficients to two or more groups on the same side o f the equals sign. In such an instance the composite mean
reflects an unequal weighting of the groups. On the other hand, when there are an unequal number of subjects
in any of the groups on the same side of the equals sign, any groups which do not have the same sample size
w i l l be assigned a different coefficient. The coefficient a group is assigned w i l l reflect the proportion it
contributes to the total number o f subjects on that side o f the equals sign. Thus, i f Groups 1 and 2 are compared
with Group 3, and there are 4 subjects in Group 1 and 6 subjects in Group 2, there are a total of 10 subjects
involved on that side o f the equals sign. The absolute value o f the coefficient assigned to Group 1 will be
410=25, whereas the absolute value assigned to Group 2 will be 610=35.
26 When any o f the coefficients are fractions, in order to simplify calculations some sources prefer to convert the
coefficients into integers. In order to do this, each coefficient must be multiplied by a least common
denominator. A least common denominator is the smallest number (excluding 1) which is divisible by all o f
the denominators o f the coefficients. With respect to the complex comparison under discussion, the least
common denominator is 2, since 2 is the smallest number that can be divided by 1 and 2 (which are the
denominators of the coefficients 1/1 = 1 and A / / ) . I f all of the coefficients are multiplied by 2, the coefficients
are converted into the following values: c = a"'1, c = a"'1, c = +2. I f the latter coefficients are employed in
1 2 3

the calculations which follow, they will produce the same end result as that obtained through use o f the
coefficients employed in Table 21.4. I t should be noted, however, that i f the converted coefficients are
employed, the value a"' (cj)(XAj)=a"'1.7 in Table 21.4 will become twice the value that is presently listed
(i.e., it becomes 3.4). As a result of this, a"' (cj)(XA j ) w i l l no longer represent the difference between the two
sets of means contrasted in the null hypothesis. Instead, it w i l l be a multiple of that value a€" specifically, the
multiple o f the value by which the coefficients are multiplied.
27 A special case o f orthogonal contrasts/comparisons can be summarized through use o f the term trend analysis
which is discussed at the end o f this section.
28 a) I f n a99o n , (MSWG/na)+(MSWG/nb) is employed as the denominator of Equation 21.22 and ([ (a"'
a b

ca2)(MSWG) ]/na)+([ (a"' cb2)(MSWG) ]/nb) is employed as the denominator of Equation 21.23; b) I f the
value (sEoa2/na)+(sEob2/nb) is employed as the denominator of Equations 21.22/21.23, or i f the degrees of
freedom for Equations 21.22/21.23 are computed with Equation 11.4 (df = n + n a€" 2), a different result will
a b

be obtained since: 1) Unless the variance for all o f the groups is equal, it is unlikely that the computed t value
w i l l be identical to the one computed with Equations 21.22/21.23; and 2) I f df = n + n a€" 2 is used, the
a b

tabled critical t value employed w i l l be larger than the tabled critical value employed for Equations
21.22/21.23. This is the case, since the df value associated with MS is larger than the df value associated with
WG

df = n + n a€" 2. The larger the value of df, the lower the corresponding tabled critical t value. Consequently,
a b

Fishera€™s L S D method will provide an even more powerful test of an alternative hypothesis than the
conventional t test.
29 Equations 21.24 and 21.25 are, respectively, derived from Equations 21.22 and 21.23. When two groups are
compared with one another, the data may be evaluated with either the t distribution or the F distribution. The
2
relationship between the two distributions is F = t . I n view of this, the term F(1,WG) in Equations 21.24 and
21.25 can also be written as t . I n other words, Equation 21.24 can be written as
dfWG

CDLSD=tdfWG(2MSWG)/n (and, in the case of Equation 21.25, the same equation is employed, except for
the fact that, inside the radical, (a"' cj2) is employed in place of 2). Thus, in the computations to follow, i f a
nondirectional alternative hypothesis is employed with I± = .05, one can employ either F.05 = 4.75 (for dfnum =
2
1, d f = 12) or t = 2.18 (for df = 12), since ( t = 2.18) = ( F = 4.75).
den 0 5 0 5 0 5

30 As is noted in reference to linear contrasts, the df = 1 value for the numerator o f the F ratio for a single degree
of freedom comparison is based on the fact that the comparison involves k = 2 groups with df = k a€" 1. In the
same respect, in the case of complex comparisons there are two sets of means, and thus df = k c o m pa€" 1.
31 As indicated in Endnote 29, i f a t value is substituted in Equation 21.24 it will yield the same result. Thus, i f 1 05

= 2.18 is employed in the equation CDLSD=tdfWG(2MSWG)/n, CDLSD=2.18[ (2)(1.9) ]/5=1.90.


32 The absolute value o f t is employed, since a nondirectional alternative hypothesis is evaluated.
33 When n a99o n , (MSWG/na)+(MSWG/nb) is employed in Equation 21.26 in place of (2MSWG)/n.
1 2

34 Since, in a two-tailed analysis we are interested in a proportion which corresponds to the most extreme .0167
cases in the distribution, one-half of the cases (i.e., 00835) falls in each tail of the distribution.
35 The only exception to this w i l l be when one comparison is being made (i.e., c = 1), in which case both methods
yield the identical CD value.
36 Some sources employ the abbreviation WSD for wholly significant difference instead of HSD.
37 The value of I ± F Wfor Tukeya€™s HSD test is compared with the value of I ± for other comparison
F W

procedures within the framework of the discussion of the ScheffA© test later in this section.
38 When the tabled critical q value for k = 2 treatments is employed, the Studentized range statistic w i l l produce
equivalent results to those obtained with multiple f tests/Fishera€™s L S D test. This w i l l be demonstrated
later in reference to the Newmana€"Keuls test, which is another comparison procedure that employs the
Studentized range statistic.
39 As is the case with a t value, the sign o f q w i l l only be relevant i f a directional alternative hypothesis is
evaluated. When a directional alternative hypothesis is evaluated, in order to reject the null hypothesis, the sign
o f the computed q value must be in the predicted direction, and the absolute value o f q must be equal to or
greater than the tabled critical q value at the prespecified level o f significance.
40 Although Equations 21.30 and 21.31 can be employed for both simple and complex comparisons, sources
generally agree that Tukeya€™s HSD test should only be employed for simple comparisons. Its use with only
simple comparisons is based on the fact that in the case o f complex comparisons it provides an even less
powerful test o f an alternative hypothesis than does the ScheffA© test (which is an extremely conservative
procedure) discussed later in this section.
41 a) When n a9 n and/or the homogeneity o f variance assumption is violated, some sources recommend using
a b

the following modified form o f Equation 21.31 (referred to as the Tukeya€"Kramer procedure) for
computing CD . HSD

42 The only comparison for which the minimum required difference computed for the Newmana€"Keuls test w i l l
equal CDHSD w i l l be the comparison contrasting the smallest and largest means in the set o f k means. One
exception to this is a case in which two or more treatments have the identical mean value, and that value is
either the lowest or highest o f the treatment means. Although the author has not seen such a configuration
discussed in the literature, one can argue that in such an instance the number o f steps between the lowest and
highest mean should be some value less than k. One can conceptualize all cases o f a tie as constituting one step,
or perhaps, for those means which are tied, one might employ the average o f the steps that would be involved i f
all those means are counted as separate steps. The larger the step value that is employed, the more conservative
the test.
43 I f Equation 21.30 is employed with the framework of the Newmana€"Keuls test, it uses the same values
which are employed for Tukeya€™s HSD test, and thus yields the identical q=(9.2a"'6.6)/1.9/5=4.22.
44 One exception to this involving the Bonferronia€"Dunn test w i l l be discussed later in this section.
45 Recollect that the Bonferronia€"Dunn test assumed that a total of 6 comparisons are conducted (3 simple
comparisons and 3 complex comparisons). Thus, as noted earlier, since the number o f comparisons exceeds
[k(k a€" 1)]/2 = [3(3 a€" 1)]/2 = 3, the ScheffA© test w i l l provide a more powerful test of an alternative
hypothesis than the Bonferronia€"Dunn test.
46 The author is indebted to Scott Maxwell for his input on the content o f the discussion to follow.
47 Dunnett (1964) developed a modified test procedure (described in Winer et al. (1991)) to be employed in the
event there is a lack o f homogeneity o f variance between the variance o f the control group and the variance o f
the experimental groups with which it is contrasted.
48 The philosophy to be presented here can be generalized to any inferential statistical analysis.
49 Although the difference between I ± = .05 and I ± = .10 may seem trivial, it is a fact o f scientific life that a
F W F W

result which is declared statistically significant is more likely to be submitted and/or accepted for publication
than a nonsignificant result.
50 The reader may find it useful to review the discussion of effect size in Section V I of the single sample f test
and the f test for two independent samples. Effect size indices are also discussed in the chapter on meta-
analysis (Test 43).
51 The reader may want to review the discussion of confidence intervals in Section V I of both the single sample f
test and the f test for two independent samples.
52 a) The F test for two population variances (discussed in conjunction with the F m a xtest as Test 11a in Section
V I of the f test for two independent samples) can only be employed to evaluate the homogeneity of variance
hypothesis when k = 2; b) Smithson (2003, pp. 25a€"27) provides Equation 21.87 for computing a confidence
interval for the ratio o f two variances. He notes that the latter equation allows for computation o f a one-tailed
confidence interval which w i l l have an upper limit o f a"z. The lower limit of the latter confidence interval will
be the result obtained with Equation 21.87. To illustrate, to compute a one-tailed 95% confidence interval, the
lower limit w i l l be the result obtained when the ratio o f the two variances is divided by the appropriate tabled
critical F value for (n a€" 1), ( n a€" 1) degrees of freedom in Table A10 (specifically, the tabled critical
0 5 L S

value for ( n a€" 1), ( n a€" 1) degrees of freedom in Table A10 for F ) . A n upper limit for a onetailed 95%
L S 9 5
confidence interval with a lower limit o f zero could also be computed i f one had access to tabled critical values
for the lower tail of the F distribution. The latter .05 value would then be divided into the ratio of the two
variances in order to obtain the upper limit of the confidence interval. (The reader should consult Endnote 13
of the f test for two independent samples for clarification of the latter.)
53 a) See Endnote 10 of the single-sample f test for further clarification of the noncentrality parameter; b)In
Endnote 21 of the chi-square test for r A — c tables, it was noted that !• is an alternative form of the lower
case Greek letter phi. This alternative form o f phi is employed here (as well as in other chapters) to represent
the noncentrality parameter for the F distribution. The notation I f should not be confused with the notation !•,
which is employed for the phi coefficient in various places in the book.
54 I f a power analysis is conducted after the analysis o f variance, the value computed for MS can be employed
WG

to represent I/WG2 i f the researcher has reason to believe it is a reliable estimate of error/within-groups
variability in the underlying population. I f prior to conducting a study reliable data are available from other
studies, the MSWG values derived in the latter studies can be used as a basis for estimating I/WG2.
55 When there is no curve which corresponds exactly to dfWG, one can either interpolate or employ the dfWG value
closest to it.
56 Tiku (1967) has derived more detailed power tables that include the alpha values .025 and .005.
57 a) When k = 2, Equations 21.41/21.42 and Equation 11.15 will yield the same I99oEoe2 value; b) Grissom and
K i m (2005, p. 122a€"123) note the following with respect to I99oEoe2: 1) The equation for computing I99oEoe2
assumes equal sample sizes and homogeneity o f variance; 2) A statistically significant F value indicates the
value of I 9 E o 2 is significantly different from zero; 3) When a value computed for a treatment effect is less
than zero it should be reported for purposes o f computing a confidence interval (which Grissom and K i m
(2005, p. 122) provide a reference for) and for use in meta-analysis (which is discussed in Test 43); 4)
Equation 21.88 can be employed for computing an effect size (through use of I99oEoe2) for a simple
comparison.
58 The following should be noted with respect to the eta squared statistic: a) Earlier in this section it was noted
that Cohen (1977; 1988, pp. 284a€"287) employs the values .0099, .0588, and .1379 as the lower limits for
defining a small versus medium versus large effect size for the omega squared statistic. I n actuality, Cohen
2
(1977; 1988, pp. 284a€"287) employs the notation for eta squared (i.e., I ) in reference to the aforementioned
effect size values. However, the definition Cohen (1977; 1988, p. 281) provides for eta squared is identical to
Equation 21.40 (which is the equation for omega squared). For the latter reason, various sources (e.g., Keppel
(1991) and Kirk (1995)) employ the values .0099, .0588, and .1379 in reference to the omega squared
statistic; b) Equation 21.44 (the equation for Adjusted I Eoe2) is essentially equivalent to Equation 21.40 (the
definitional equation for the population parameter omega squared); c) When k = 2, the eta squared statistic is
equivalent to r , which represents the point-biserial correlation coefficient (Test 28i). Under the discussion
pb

of r the equivalency of I-Eoe2 and r is demonstrated; d) Grissom and Kim (2005, p. 121) note that when the
pb pb

independent variable is quantitative, I - represents the correlation between the independent and dependent
variables, but unlike a Pearson product-moment correlation between the two variables, I - reflects a
2
curvilinear as well as a linear relationship; e) Some sources employ the notation R for the eta squared
2
statistic. The statistic represented by R or I-Eoe2 is commonly referred to as the correlation ratio, which is the
squared multiple correlation coefficient which is computed when multiple regression (Test 33) is employed
to predict subjectsa€™ scores on the dependent variable based on group membership. (Grissom and K i m
(2005, p. 121) note that originally eta was used to represent the correlation ratio but the latter term has
2
since been employed to refer to I-Eoe2.) The use of R in this context reflects the fact that the analysis of
variance can be conceptualized within the framework of a multiple regression model. The value o f R2=I-Eoe2
(which as noted can be computed with Equation 21.43) can also be computed with Equation 21.89 (which is
similar to but not identical to Equation 21.42 emoloved to comnute I 9 E o 2 ) .
59a Table 21.24 (extracted from Table 8.3.13 in Cohen (1988, pp. 313a€"314)) provides representative power
values associated with f values of. 10 (small effect size), .25 (medium effect size), and .40 (large effect size).
The table assumes k = 3 groups with I± = .05. To illustrate the use o f Table 21.4, i f a researcher is interested in
detecting a medium effect size (i.e., f = .25) and employs 10 subjects per group, the power associated with the
analysis will only be .20. Comprehensive tables containing analogous information for use with the single-
factor between-subjects analysis of variance (for I± = .10, .05, and .01 and for various values of k) can be
found in Cohen (1988, pp. 289a€"354).
60 Some sources employ F(1,dfWG) in Equation 21.48 instead of t f . Since t=F, the two values produce
d WG

equivalent results. The tabled critical two-tailed t value at a prespecified level o f significance w i l l be equivalent
to the square root of the tabled critical F value at the same level of significance for df num= 1 and d f = df .
d e n WG

61 In using Equation 2.7, s1/n is equivalent to sXA 1.


62 In the interest of accuracy, Keppel and Zedeck (1989, p. 98) note that although when the null hypothesis is true,
the median of the sampling distribution for the value of F equals 1, the mean of the sampling distribution of F
is slightly above one. Specifically, the expected value o f F = df /df « ) . I t should also be noted that
WG WG a € 2

although it rarely occurs, it is possible for MS WG > MS . I n such a case, the value of F w i l l be less than 1 and,
BG

obviously, i f F < 1, the result cannot be significant.


63 In employing double (or even more than two) summation signs such as I£j=1kI£i=1n, the mathematical
operations specified are carried out beginning with the summation sign that is farthest to the right and
continued sequentially with those operations specified by the summation signs to the left. To illustrate, i f k = 3
and n = 5, the notation a"'j=1ka"'i=1nXij indicates that the sum of the n scores in Group 1 is computed, after
which the sum o f the n scores in Group 2 is computed, after which the sum o f the n scores in Group 3 is
computed. The leftmost notation sign indicates that the final result will be the sum o f the three sums which
have been computed for each o f the k = 3 groups.
64 For each of the N = 15 subjects in Table 21.14, the following is true with respect to the contribution of a
subjecta€™s score to the total variability in the data:
65 In evaluating a directional alternative hypothesis, when k = 2 the tabled F.90 and F.98 values (for the appropriate
degrees of freedom) are respectively employed as the onetailed .05 and .01 values. Since the values for F and 9 0

F are not listed in Table A10, the values F = 3.46 and F = 8.41 can be obtained by squaring the tabled
98 .90 .98

critical onetailed values t = 1.86 and t = 2.90, or by employing more extensive tables of the F distribution
0 5 0 1

available in other sources, or through interpolation.


66 In the case o f a factorial design it is also possible to have a mixed-effects model. In a mixed-effects model it
is assumed that at least one o f the independent variables is based on a fixed-effects model and that at least one
is based on a random-effects model.
67 Stevens (2002, p. 346) notes that when a researcher elects to employ more than one covariate, Huitema (1980,
p. 161) recommends limiting the number of covariates such that the ratio [Number of covariates + (Number of
groups a€" 1)] / Total sample size is less than .10.
68 The equations below are alternative equations which can be employed to compute the values SS and T(adj)

SS ( j) The slight discrepancy in values is the result of rounding off error. The computation of the values r
WG ad T

and rWG is described at a later point in the discussion o f the analysis o f covariance.
69 a) Through use of Equation 28.1, the sample correlation between the covariate and the dependent variable is
computed to be r = .347. In the computations below, N is employed to represent the total number of subjects
T

(in lieu of n, which is employed in Equation 28.1). The values a"' XT2=3664 and a"' YT2=856 are the sums of
the a"' Xj2 and a"' Yj2 scores for the k = 3 groups.
70 The equation noted below is an alternative way o f computing r through use o f the elements employed in
WG

Equation 28.1. The relevant values within each group are pooled in the numerator and denominator of the
equation.
71 I f the homogeneity o f regression assumption for the analysis o f covariance is violated, the value computed for
bWG will result in biased adjusted mean values (i.e., the adjusted mean values for the groups will not be
accurate estimates o f their true values in the underlying populations). Evaluation o f the homogeneity o f
regression assumption is discussed later in this section.
72 Equation 21.82 can also be written as follows:
73 A n alternative way to evaluate the homogeneity o f regression assumption is presented by Keppel (1991, pp.
317a€"320), who notes that the adjusted within-groups sum of squares (SS ( j)) can be broken down into
WG ad

the following two components: a) The between-groups regression sum of squares (SS ), which is a sourcebgreg

o f variability that represents the degree to which the group regression coefficients deviate from the average
regression coefficient for all of the data; and b) The within-groups regression sum of squares (SS ), wgreg

which is a source o f variability that represents the degree to which the scores o f individual subjects deviate
from the regression line of the group of which a subject is a member. Since SS ( j) is the sum of SS WG ad and bgreg

SS , the value of SS
wgreg bgreg can be expressed as follows: SS bgreg= SS ( j) a€" WG SS .
ad wgreg

74 In addition to the assumptions already noted for the analysis o f covariance, it also has the usual assumptions for
the analysis of variance (i.e., normality of the underlying population distributions and homogeneity of
variance).
1 Although it is possible to conduct a directional analysis, such an analysis w i l l not be described with respect to
the Kruskala€"Wallis one-way analysis of variance by ranks. A discussion of a directional analysis when k
= 2 can be found under the Manna€"Whitney U test. A discussion of the evaluation of a directional
alternative hypothesis when k a99o¥ 3 can be found in Section V I I of the chi-square goodness-of-fit test (Test
8). Although the latter discussion is in reference to analysis o f a k independent samples design involving cate-
gorical data, the general principles regarding analysis of a directional alternative hypothesis when k a99o¥ 3 are
applicable to the Kruskala€"Wallis one-way analysis of variance by ranks.
2 As noted in Section I V , the chi-square distribution provides an approximation of the Kruskala€"Wallis test
statistic. Although the chi-square distribution provides an excellent approximation of the Kruskala€"Wallis
sampling distribution, some sources recommend the use o f exact probabilities for small sample sizes. Exact
tables of the Kruskala€"Wallis distribution are discussed in Section V I I .
3 In the discussion of comparisons under the single-factor between-subjects analysis of variance, it is noted that
a simple (also known as a pairwise) comparison is a comparison between any two groups in a set o f k groups.
4 Note that in Equation 22.5, as the value of N increases the value computed for CD KW w i l l also increase because
o f the greater number (range o f values) o f rank-orderings required for the data.
5 The rationale for the use o f the proportions .0167 and .0083 in determining the appropriate value for z is as adj

follows. I n the case of a one-tailed/directional analysis, the relevant probability/proportion employed is based
on only one o f the two tails o f the normal distribution. Consequently, the proportion o f the normal curve which
is used to determine the value of z j will be a proportion that is equal to the value of I ± in the appropriate tail
ad P C

of the distribution (which is designated in the alternative hypothesis). The value z = 2.13 is employed, since the
proportion of cases that falls above z = 2.13 in the right tail of the distribution is .0167, and the proportion of
cases that falls below z = a"'2.13 in the left tail of the distribution is .0167 (i.e., the value z = 2.13 has an entry
in column 3 of Table A1 which is closest to .0167). I n the case of a two-tailed/nondirectional analysis, the
relevant probability/proportion employed is based on both tails o f the distribution. Consequently, the
proportion o f the normal curve which is used to determine the value o f zadj will be a proportion that is equal to
the value of I ± / 2 in each tail of the distribution. The proportion I ± / 2 = .0167/2 = .0083 is employed for a
PC PC

two-tailed/nondirectional analysis, since one-half of the proportion which comprises I ± = .0167 comes from
P C

the left tail o f the distribution and the other half from the right tail. Consequently, the value z = 2.39 is
employed, since the proportion of cases that falls above z = 2.39 in the right tail of the distribution is .0083, and
the proportion of cases that falls below z = a"'2.39 in the left tail of the distribution is .0083 (i.e., the value z =
2.39 has an entry in column 3 o f Table A1 which is closest to .0083).
6 I t should be noted that when a directional alternative hypothesis is employed, the sign o f the difference between
the two mean ranks is relevant in that it must be consistent with the prediction stated in the directional
alternative hypothesis. When a nondirectional alternative hypothesis is employed, the direction o f the
difference between two mean ranks is irrelevant.
7 a) Many researchers would probably be willing to tolerate a somewhat higher familywise Type I error rate than
.05. In such a case the difference | RA 1a"'RA 2 |=6.5 w i l l be significant, since the value of z j employed in
ad

Equation 22.5 w i l l be less than z = 2.39, thus resulting in a lower value for CD ; b) When there are a large
KW

number of ties in the data, a modified version of Equation 22.5 is recommended by some sources (e.g., Daniel
(1990)), which reduces the value of CD KW by a minimal amount. Marascuilo and McSweeney (1977, p. 318)
recommend that when ties are present in the data, the tie correction factor C = .964 computed with Equation
22.3 be multiplied by the term in the radical o f Equation 22.5. When the latter is done with the data for
Example 22.1 (as noted below), the value CDKW = 6.64 is obtained. As is the case with Equation 22.5, only the
Group 1 versus Group 3 pairwise difference is significant.
8 In Equation 22.5 the value z = 1.96 is employed for z j, and the latter value is multiplied by 2.83, which is the
0 5 ad

value computed for the term in the radical o f the equation for Example 22.1.
9 The slight discrepancy is due to rounding off error, since the actual absolute value of z computed with Equation
12.4 is 1.7756.
10 I n accordance with one of the assumptions noted in Section I for the Kruskala€"Wallis one-way analysis of
variance by ranks, in both Examples 22.2 and 22.3 it is assumed that Dr. Radical implicitly or explicitly
evaluates the N students on a continuous interval/ratio scale prior to converting the data into a rank-order
format.
11 As noted, one o f the assumptions stipulated for both the Kruskala€"Wallis one-way analysis of variance by
ranks and the Jonckheerea€"Terpstra test for ordered alternatives is that the dependent variable is a
continuous random variable, yet, in practice, this assumption (which is also common to a number o f other
nonparametric tests employed with ordinal data) is often not adhered to. I n the case of Example 22.1, the
dependent variable (number o f nonsense syllables recalled) would be conceptualized by most people as a
discrete rather than a continuous variable. On the basis o f the latter one could argue that the accuracy o f both
tests described in this chapter may be compromised i f they are employed to evaluate the data for Example 22.1.
On the other hand, i f in the description o f Example 22.1 it was stated that the experimenter had the option o f
awarding a subject partial credit with respect to whether or not he or she was successful in recalling a nonsense
syllable, the dependent variable could then be conceptualized as continuous rather than discrete.
12 The U values are commonly referred to as U counts, since the methodology employed in the counting
ab

procedure described for the Jonckheerea€"Terpstra test for ordered alternatives is the same as the counting
procedure described for the Manna€"Whitney U test in Endnote 14 o f the latter test.
13 I t should be noted that the critical values listed in Table A24 are based on probabilities which have been
computed to five decimal places, and thus a given tabled critical J value w i l l be associated with a probability
which when computed to five decimal places is equal to or less than the designated alpha value. There may be
a one-unit discrepancy between the critical values listed i n Table A24 and those found i n other sources when
the values listed i n such sources are based on rounding o f f probabilities to fewer than five decimal places.
More detailed tables for the Jonckheerea€"Terpstra test statistic can be found i n Daniel (1990) and
Hollander and Wolfe (1999).
14 Jonckheere (1954, p. 142) notes that a correction for continuity can be employed for Equation 22.7 by
subtracting the value 1 from the absolute difference obtained for the numerator o f the equation. Sources that
describe the Jonckheerea€"Terpstra test for ordered alternatives, however, do not recommend using the
correction for continuity.
15 I t should be noted that it is possible to obtain a negative z value for Equation 22.7 i f the data are consistent with
the analogous alternative hypothesis in the other tail of the sampling distribution. Put simply, i f the results are
the exact opposite of what is predicted, the value of z will be negative. Specifically, i f the data were consistent
with the alternative hypothesis H1: I a99oa I a99oa I , the value computed for J would be less than the
t 1 j 2 j 3

expected value of J computed in the numerator of Equation 22.7 (which in our example was J = 37.5). When
E

the latter is true, the value computed for z will be negative. In instances where a researcher is employing a two-
tailed alternative hypothesis, the absolute value of z is employed, since the sign of z has no bearing on whether
or not a result is significant.
16 When k = 2, the result obtained for the Jonckheerea€"Terpstra test for ordered alternatives will also be
equivalent to the one-tailed result obtained for the Kruskala€"Wallis oneway analysis of variance by ranks,
since when k = 2 the latter test is equivalent to the Manna€"Whitney U test.
1 Conover (1980,1999) and Marascuilo and McSweeney (1977) note that normal-scores tests have power equal to
or greater than their parametric analogs. The latter sources state that the asymptotic relative efficiency (which
is discussed in Section V I I of the Wilcoxon signed-ranks test (Test 6)) o f a normal-scores test is equal to 1
when the underlying population distribution(s) are normal, and often greater than 1 when the underlying
population distribution(s) are something other than normal. What the latter translates into is that for a given
level o f power, a normal-scores test w i l l require an equal number o f or even fewer subjects than the analogous
parametric test in evaluating an alternative hypothesis.
2 Although it is possible to conduct a directional analysis when k a99o¥ 3, such an analysis will not be described
with respect to the van der Waerden normal-scores test for k independent samples. A discussion o f a
directional analysis when k = 2 can be found in Section V I I I where the van der Waerden test is employed to
evaluate the data for Examples 11.1/12.1 (which are employed to illustrate the f test for two independent
samples and the Mann-Whitney U test). A discussion of the evaluation of a directional alternative hypothesis
when k a99o¥ 3 can be found in Section V I I of the chi-square goodness-of-fit test (Test 8). Although the latter
discussion is in reference to analysis o f a k independent samples design involving categorical data, the general
principles regarding analysis o f a directional alternative hypothesis when k a99o¥ 3 are applicable to the van
der Waerden normal-scores test for k independent samples.
3 The proportion o f cases in the normal distribution that falls below the mean is .5000. The value .0948 i n
Column 2 represents the proportion of cases which falls between the mean and the value z = .24. Thus, .5000 +
.0948 = .5948 represents the proportion of cases that falls below the value z = .24.
4 a) Conover (1980, 1999) notes that i f there are no ties, the mean o f the N z j scores (i.e., the mean o f all N
i

normal-scores) w i l l equal zero, and be extremely close to zero when there are ties. Thus, i f the mean equals
zero, the equation sEoe2=l£(Xa"'XA )2/(na"'1) (which is Equation 1.8, the definitional equation for computing
the unbiased estimate o f a population variance) reduces to sEoe2=P£X2/(na"'1). I f z is employed in place of X
and N in place of n (since in Equation 23.1 the variance of N z scores is computed), we obtain Equation 23.1,
sEre2=l£zij2/(Na"'1) ;b) The author is indebted to Joe Abramson for correcting an error in the computation of
t h
the variance in the 4 edition o f this book.
5 When two or more groups do not have the same sample size, the value o f n for a given group is used to
j

represent the group sample size in any o f the equations which require the group sample size.
6 In the discussion of comparisons under the single-factor between-subjects analysis of variance, it is noted that
a simple (also known as a pairwise) comparison is a comparison between any two groups in a set o f k groups.
7 The rationale for the use of the proportions .0167 and .0083 is explained more thoroughly in Endnote 5 of the
Kruskala€"Wallis one-way analysis of variance. In the case of the latter test, since the normal distribution is
employed in the comparison procedure, the explanation o f the proportions is in reference to a standard normal
deviate (i.e., a z value). The same rationale applies when the t distribution is employed, with the only
difference being that for a corresponding probability level, the t values which are used are different from the z
values employed for the Kruskala€"Wallis comparison procedure.
8 I t should be noted that when a directional alternative hypothesis is employed, the sign o f the difference between
the two mean normal-scores is relevant in that it must be consistent with the prediction stated in the directional
alternative hypothesis. When a nondirectional alternative hypothesis is employed, the direction o f the
difference between two mean normal-scores is irrelevant.
9 Equation 23.4 is analogous to Equation 22.8, which Conover (1980, 1999) employs in conducting a comparison
for the Kruskala€"Wallis one-way analysis of variance. I f the element [ (Na"'1a"T$vdw2)/(Na"'k) ] is
omitted from the radical in Equation 23.4, it becomes Equation 23.5, which is analogous to Equation 22.5.
Equation 23.5 w i l l yield a larger CD
vdw value than Equation 23.4. It is demonstrated below that when Equation
23.5 is employed with the data for Example 23.1, the value CD vdw = 1.504 is obtained. I f the latter CD value
vdw

is employed, none o f the pairwise comparisons is significant, since no difference is equal to or greater than
1.504. As noted in the discussion o f comparisons in the Kruskala€"Wallis one-way analysis of variance, an
equation in the form o f Equation 23.5 conducts a less powerful/more conservative comparison.
th
10 The tabled critical one-tailed .05 and .01 values are, respectively, the tabled chi-square values at the 90 and
th
98 percentiles/quantiles o f the chi-square distribution. For clarification regarding the latter values, the reader
should review the material on the evaluation o f a directional hypothesis involving the chi-square distribution in
Section V I I o f the chi-square goodness-of-fit test, and the discussion o f Table A4 in Section I V o f the single-
sample chi-square test for a population variance (Test 3). Since the chi-square value at the 98 percentile is th

not in Table A4, the value IJ.012=5.10 is an approximation of the latter value.
1.A within-subjects/repeated-measures design in which each subject serves under each of the k levels of the
independent variable is often described as a special case o f a randomized-blocks design. The term
randomized-blocks design is commonly employed to describe a dependent samples design involving matched
subjects. As an example, assume that 10 sets o f identical triplets are employed in a study to determine the
efficacy o f two drugs when compared with a placebo. Within each set o f triplets one o f the members is
randomly assigned to each o f the three experimental conditions. Such a design is described in various sources
as a matched-subjects/samples design, a dependent samples design, a correlated-subjects design, or a
randomized-blocks design. Within the usage o f the term randomized-blocks design, each set o f triplets
constitutes a block, and consequently, 10 blocks are employed in the study with three subjects in each block.
Further discussion o f the randomized-blocks design can be found in Section V I I o f the between-subjects
factorial analysis of variance.
6.In Section V I I it is noted that the sum of between-conditions variability and residual variability represents
what is referred to as within-subjects variability. The sum o f squares o f within-subjects variability (SS ) is WS

the sum of between-conditions variability and residual variability a€" i.e., SS = SS + SS .


WS BC res

7.Since there is an equal number o f scores in each condition, the Equation for SSBC can also be written as follows:
8.The equation for SS can also be written as follows:
BS

9.In the interest of accuracy, as is the case with the single-factor between-subjects analysis of variance, a
significant omnibus F value indicates that there is at least one significant difference among all possible
comparisons that can be conducted. Thus, it is theoretically possible that none o f the simple/pairwise
comparisons is significant, and that the significant difference (or differences) involves one or more complex
comparisons.
10.As noted in Section V I of the single-factor between-subjects analysis of variance, in some instances the
CDB/D value associated with the Bonferronia€"Dunn test will be larger than the CD value associated with the
S

ScheffA© test. However, when there are c = 3 comparisons, CD will be greater than CD .
S B/D

11 .One can, of course, conduct a replication study and base the estimate of M S on the value of M S obtained for
res res

the comparison in the latter study. I n point o f fact, one or more replication studies can serve as a basis for
obtaining the best possible estimate of error variability to employ for any comparison conducted following an
analysis o f variance.
12.If the means of each of the conditions for which a composite mean is computed are weighted equally, an even
simpler method for computing the composite score of a subject is to add the subjecta€™s scores and divide the
sum by the number o f conditions which are involved. Thus, the composite score o f Subject 1 can be obtained
by adding 9 and 7 and dividing by 2. The averaging procedure w i l l only work i f all o f the means are weighted
equally. The protocol described in Section V I must be employed in instances where a comparison involves
unequal weighting o f means.
13.The same result is obtained i f (for the three difference scores) the score in the first condition noted is subtracted
from the score in the second condition noted (i.e., Condition 2 a€" Condition 1; Condition 3 a€" Condition 1;
Condition 3 a€" Condition 2).
14.If the variance of Condition 2 is employed to represent the lowest variance, r also equals .85.
X 2 X 3

16.Inspection o f the d f = 10 curve reveals that for d f = 10, a value of approximately I f = 3.1 or greater w i l l be
r e s r e s

associated with a power o f 1.


17. A number o f different alternative equations have been proposed for computing standard omega squared.
Although a slightly different equation was employed in the first edition o f this book, it yields approximately
the same result that is obtained with Equation 24.25. Grissom and K i m (2005, p. 135) note that Equation 24.39
can be employed to compute standard omega squared. The latter equation is equivalent to Equation 25.25.
2
18. The eta squared statistic (I-Eoe2) (also represented by R , and commonly referred to as the correlation ratio)
computed for the single-factor between-subjects analysis of variance can also be computed for the single-
factor within-subjects analysis of variance. The partial version o f the latter statistic can be computed with
Equation 24.40. Employing the latter equation, the value I-Eoe2=.89 is computed below. Keppel and Wickens
(2004, p. 362) note that the value computed with Equation 24.40 tends to overestimate the degree of
relationship between the independent and dependent variables in the underlying population.
19.In using Equation 2.8, sEoe1/n is equivalent to sXA 1.
20.The author is indebted to Joe Abramson for alerting him to an error in the printout of the computational values
t h
for Equation 24.30 in the 4 edition of this book.
21.In employing double (or even more than two) summation signs such as I£j=1kI£i=1n, the mathematical
operations specified are carried out beginning with the summation sign that is farthest to the right and
continued sequentially with those operations specified by summation signs to the left. Specifically, i f k = 3 and
n = 6, the notation I£j=1kI£i=1nXij indicates that the sum o f the n scores in Condition 1 is computed, after
which the sum o f the n scores in Condition 2 is computed, after which the sum o f the n scores in Condition 3 is
computed. The final result w i l l be the sum o f all the aforementioned values which have been computed. On the
other hand, the notation I£i= 1 nI£j=1 kXij indicates that the sum of the k = 3 scores of Subject 1 is computed,
after which the sum o f the k = 3 scores o f Subject 2 is computed, and so on until the sum o f the k = 3 scores o f
Subject 6 is computed. The final result will be the sum o f all the aforementioned values which have been
computed. I n this example the final value computed for I£j=1kI£i=1nXij will be equal to the final value
computed for I£i=1nI£j=1kXij. In obtaining the final value, however, the order in which the operations are
conducted is reversed. Specifically, in computing I£j=1kI£i=1nXij, the sums of the k columns are computed
and summed in order to arrive at the grand sum, while in computing I£i=1nI£j=1kXij, the sums o f the n rows
are computed and summed in order to arrive at the grand sum.
22. For each of the N = 18 scores in Table 24.16, the following is true with respect to the contribution of any score
to the total variability in the data.
23. As noted in Section V I under the discussion of computation of a confidence interval, MS WC is equivalent to
MSWG (which is the analogous measure o f variability for the single-factor between-subjects analysis of
variance).
24. A n issue discussed by Keppel (1991) which is relevant to the power o f the single-factor within-subjects
analysis of variance is that even though counterbalancing is an effective procedure for distributing practice
effects evenly over the k experimental conditions in a within-subjects design, i f practice effects are, in fact,
present in the data, the value of M S will be inflated, and because of the latter the power of the single-factor
res

within-subjects analysis of variance w i l l be reduced. Keppel (1991) describes a methodology for computing
an adjusted measure o f MS , which is independent o f practice effects, that allows for a more powerful test o f
res

an alternative hypothesis.
25.In evaluating a directional alternative hypothesis, when k = 2 the tabled F and F values (for the appropriate
9 0 9 8

degrees of freedom) are respectively employed as the one-tailed .05 and .01 values. Since the values for F 9 0

and F 9 8 are not listed in Table A10, the values F 9 0 = 3.36 and F
9 8 = 7.95 can be obtained by squaring the
tabled critical one-tailed values t = 1.83 and t = 2.82, or by employing more extensive tables of the F
0 5 0 1

distribution available in other sources, or through interpolation.


26.Example 24.6 (as well as Example 24.7) can also be viewed as an example of what is commonly referred to as
a time-series design (although time-series designs typically involve more measurement periods than are
employed in the latter example). The latter design is essentially a one-group pretest-posttest design involving
one or more measurement periods prior to an experimental treatment, and one or more measurement periods
following the experimental treatment.
1.The reader should take note o f the fact that when there are k = 2 dependent samples, the Wilcoxon matched-
pairs signed-ranks test (which is also described in this book as a nonparametric test for evaluating ordinal
data) w i l l not yield a result equivalent to that obtained with the Friedman two-way analysis of variance by
ranks. Since the Wilcoxon test (which rank-orders interval/ratio difference scores) employs more information
than the Friedman test/binomial sign test, it provides a more powerful test o f an alternative hypothesis than
the latter tests.
2. A more detailed discussion of the guidelines noted below can be found in Sections I and V I I of the f test for two
dependent samples.
3. Although it is possible to conduct a directional analysis, such an analysis w i l l not be described with respect to
the Friedman two-way analysis of variance by ranks. A discussion o f a directional analysis when k = 2 can
be found under the binomial sign test for two dependent samples. A discussion o f the evaluation o f a
directional alternative hypothesis when k a99o¥ 3 can be found in Section V I I of the chi-square goodness-of-fit
test (Test 8). Although the latter discussion is in reference to analysis o f a k independent samples design
involving categorical data, the general principles regarding analysis o f a directional alternative hypothesis
when k a99o¥ 3 are applicable to the Friedman two-way analysis of variance by ranks.
4. Note that this ranking protocol differs from that employed for other rank-order procedures discussed in the book.
In other rank-order tests, the rank assigned to each score is based on the rank-order o f the score within the
overall distribution of nk = N scores.
5. As noted in Section I V , the chi-square distribution provides an approximation of the Friedman test statistic.
Although the chi-square distribution provides an excellent approximation o f the Friedman sampling
distribution, some sources recommend the use o f exact probabilities for small sample sizes. Exact tables o f the
Friedman distribution are discussed in Section V I I .
6.In the discussion of comparisons in reference to the analysis of variance, it is noted that a simple (also known as
a pairwise) comparison is a comparison between any two groups/conditions in a set o f k groups/conditions.
7. Equation 25.8 is an alternative form o f the comparison equation, which identifies the minimum required
difference between the means of the ranks o f any two conditions in order for them to differ from one another
at the prespecified level of significance. I f the CD value computed with Equation 25.5 is divided by n, it
F

yields the value CDF(RA aa"'RA b) computed with Equation 25.8.


8. The method for deriving the value o f z for the Friedman two-way analysis of variance by ranks is based on
adj

the same logic that is employed in Equation 22.5 (which is used for conducting comparisons for the
Kruskala€"Wallis one-way analysis of variance by ranks). A rationale for the use o f the proportions .0167
and .0083 in determining the appropriate value for zadj in Example 25.1 can be found in Endnote 5 o f the
Kruskala€"Wallis one-way analysis of variance by ranks.
9.It should be noted that when a directional alternative hypothesis is employed, the sign o f the difference between
the two sums o f ranks must be consistent with the prediction stated in the directional alternative hypothesis.
When a nondirectional alternative hypothesis is employed, the direction o f the difference between two sums o f
ranks is irrelevant.
10. Unfortunately sources are not in agreement with respect to what equation is most appropriate to employ for a
comparison. Among the alternative procedures recommended are the following: a) Hollander and Wolfe (1999,
p. 296) and Zar (1999, p. 267) recommend Equation 25.9 (which employs the Studentized range statistic
discussed in Section V I o f the single-factor between-subjects analysis of variance) for unplanned
comparisons with large samples. (Zar (1999, p. 267) states that Equation 25.5 should only be employed i f a
number of groups are compared one at a time with a control group a€" i.e., that Equation 25.5 is employed
within the framework of the Dunnet methodology described in Section V I of the single-factor between-
subjects analysis of variance.) The q value in Equation 25.9 is obtained from Table A13 (Table of the
Studentized Range Statistic) in the Appendix (where k represents the number of experimental conditions and
the value a"z is employed for d f ) . Use of Equation 25.9 with Example 25.1 (employing the q value (for 3,
error 0 5

a"z)) yields the value CD = 8.11, which indicates that in order to declare a difference between two conditions
F

significant (with the familywise error rate adjusted to .05) the difference between the sums o f ranks o f any two
conditions must be equal to or greater than 8.11 (which, as is the case when Equation 25.5 is employed, is only
true for |I£R a€" I£R | = 8.11).
1 3

11. a) In the case o f both the Wilcoxon matched-pairs signed-ranks test and the binomial sign test for two
dependent samples, it is assumed that for each pairwise comparison a subjecta€™s score in the second
condition which is listed for a comparison is subtracted from the subjecta€™s score in the first condition that is
listed for the comparison. I n the case o f both tests, reversing the order o f subtraction will yield the same result;
b) Marascuilo and McSweeney (1977, p. 369) state that the use o f the Wilcoxon matched-pairs signed-ranks
test is acceptable for comparisons as long as it is restricted to planned pairwise comparisons.
12. The value n = 6 is employed for the Condition 1 versus Condition 2 and Condition 1 versus Condition 3
comparisons, since no subject has the same score in both experimental conditions. On the other hand, the value
n = 5 is employed in the Condition 2 versus Condition 3 comparison, since Subject 6 has the same score in
Conditions 2 and 3. The use o f n = 5 is predicated on the fact that in conducting the Wilcoxon matched-pairs
signed-ranks test, subjects who have a difference score of zero are not included in the computation of the test
statistic.
13.In Equation 25.5 the value z = 1.96 is employed for z j, and the latter value is multiplied by 3.46, which is
0 5 ad

the value computed for the term in the radical of the equation for Example 25.1.
14.Iman and Davenport (1980) argued that Equation 25.1 allows for too conservative a test of the alternative
hypothesis (i.e., inflates the likelihood of committing a Type I I error), and that Equation 25.11, which employs
the F distribution, allows for a more powerful test of the Friedman two-way analysis of variance by ranks
test statistic. The F value obtained with Equation 25.11 is evaluated with Table A10 (Table of the F
Distribution) in the Appendix. The degrees o f freedom employed for the analysis are df num = k a€" 1 and d f
d e n

= (k a€" 1)(n a€" 1). In the case of Example 25.1, df num = 3 a€" 1 = 2 and d f = (3 a€" 1)(6 a€" 1) = 10, and
den

the tabled F.95 and F.99 values for dfnum = 2 and dfden = 10 are F.95 = 4.10 and F.99 = 7.56. In order to reject the
null hypothesis, the obtained F value must be equal to or greater than the tabled critical value at the
prespecified level o f significance.
15.It is also the case that the exact binomial probability for the binomial sign test for two dependent samples
w i l l correspond to the exact probability for the Friedman test statistic.
16.If Subject 2 is included in the analysis, Equation 25.1 yields the value IJr2=4.9, which is also significant at the
.05 level.
17.In Section I it is noted that in employing the Friedman test it is assumed that the variable which is ranked is a
continuous random variable. Thus, it would be assumed that the racing form o f a horse was at some point
either explicitly or implicitly expressed as a continuous interval/ratio variable.
18.It should be noted that it is possible to obtain a negative z value for Equation 25.7 i f the data are consistent with
the analogous hypothesis in the other tail of the sampling distribution. Put simply, i f the results are the exact
opposite of what is predicted, the value of z w i l l be negative. Specifically, i f the data were consistent with the
alternative hypothesis H : I a99oa I a99oa I , the value computed for L would be less than the expected value
1 j 1 j 2 j 3

of L computed in the numerator of Equation 25.7 (which in our example was L = 72). When the latter is true,
E

the value computed for z w i l l be negative. I n instances where a researcher is employing a two-tailed alternative
hypothesis, the absolute value o f z is employed, since the sign o f z has no bearing on whether or not a result is
significant. In the case o f a two-tailed alternative hypothesis, tabled critical two-tailed values are employed.
1 .A more detailed discussion of the guidelines noted below can be found in Sections I and V I I of the f test for two
dependent samples.
2. Although it is possible to conduct a directional analysis, such an analysis will not be described with respect to
the Cochran Q test. A discussion of a directional analysis when k = 2 can be found under the McNemar test.
A discussion of the evaluation of a directional alternative hypothesis when k a99o¥ 3 can be found in Section
V I I o f the chi-square goodness-of-fit test (Test 8). Although the latter discussion is in reference to analysis o f
a k independent samples design involving categorical data, the general principles regarding analysis o f a
directional alternative hypothesis when k a99o¥ 3 are applicable to the Cochran Q test.
3. The use of Equation 26.1 to compute the Cochran Q test statistic assumes that the columns in the summary
table (i.e., Table 26.1) are employed to represent the k levels o f the independent variable, and that the rows are
employed to represent the n subjects/matched sets of subjects. I f the columns and rows are reversed (i.e., the
columns are employed to represent the subjects/matched sets o f subjects, and the rows the levels o f the
independent variable), Equation 26.1 cannot be employed to compute the value of Q.
4. The same Q value is obtained i f the frequencies of No responses (0) are employed in computing the summary
values used in Equation 26.1 instead of the frequencies of Yes (1) responses. To illustrate this, the data for
Example 26.1 are evaluated employing the frequencies o f No (0) responses.
5.In the discussion of comparisons in reference to the analysis of variance, it is noted that a simple (also known as
a pairwise) comparison is a comparison between any two groups/conditions in a set o f k groups/conditions.
6.The method for deriving the value o f z for the Cochran Q test is based on the same logic that is employed in
adj

Equation 22.5 (which is used for conducting comparisons for the Kruskala€"Wallis one-way analysis of
variance by ranks (Test 22)). A rationale for the use of the proportions .0167 and .0083 in determining the
appropriate value for z in Example 26.1 can be found in Endnote 5 o f the Kruskala€"Wallis one-way
adj

analysis of variance by ranks.


7.It should be noted that when a directional alternative hypothesis is employed, the sign o f the difference between
the two proportions must be consistent with the prediction stated in the directional alternative hypothesis.
When a nondirectional alternative hypothesis is employed, the direction o f the difference between the two
proportions is irrelevant.
8.In Equation 26.3 the value z = 1.96 is employed for z j, and the latter value is multiplied by .204, which is the
0 5 ad

value computed for the term in the radical o f the equation for Example 26.1.
10.In conducting the binomial sign test for two dependent samples, what is relevant is in which of the two
conditions a subject has a higher score, which is commensurate with assigning a subject to one o f two response
categories. As is the case with the McNemar test and the Cochran Q test, the analysis for the binomial sign
test for two dependent samples does not include subjects who obtain the same score in both conditions.
2
11. The value I J = 5.44 is also obtained for Example 19.1 through use of Equation 8.2, which is the equation for
the chi-square goodness-of-fit test. I n the case of Example 19.1, the latter equation produces an equivalent
result to that obtained with Equation 19.3 (the normal approximation). The result o f the binomial analysis o f
Example 19.1 with the chi-square goodness-of-fit test is summarized in Table 19.2.
12. Within the framework o f a time series design, one or more blocks can be included to serve as controls.
Specifically, in Example 26.6 additional cities might have been selected in which the gun control law was
always in effect (i.e., in effect during Time 2 as well as during Times 1 and 3). Differences on the dependent
variable during Time 2 between the control cities and the cites in which the law was nullified between 1985
and 1989 could be contrasted to further evaluate the impact o f the gun control law. Unfortunately, i f the law in
question is national, such control cities would not be available in the nation in which the study is conducted.
The reader should note, however, that even i f such control cities were available, the internal validity o f such a
study would still be subject to challenge, since it would still not ensure adequate control over all potential
extraneous variables.
13. a) Cochran (1950) and Winer et al. (1991) note that i f a single-factor within-subjects analysis of variance is
employed to evaluate the data in a Cochran Q test summary table (e.g., Tables 26.1, 26.2, 26.3, 26.4, 26.5,
26.6, 26.7, 26.8, 26.9), it generally leads to similar conclusions as those reached when the data are evaluated
with Equation 26.1. However, the question of whether it is appropriate to employ an analysis of variance to
evaluate the categorical data in the Cochran Q test summary table is an issue on which researchers do not
agree; b) I t might be more worthwhile to conceptualize a study such as that represented by Example 26.6
within the framework o f a mixed factorial design (which will always involve at least two independent
variables). In a mixed factorial design involving two independent variables, one independent variable is a
between-subjects variable (i.e., each subject/block is evaluated under only one level o f that independent
variable), while the other independent variable is a within-subjects variable (i.e., each subject/block is
evaluated under all levels o f that independent variable). I n the latter type o f analysis, the dependent variable is
generally represented by interval/ratio level data. I f a study such as Example 26.6 were conceptualized as a
mixed factorial design, different types of cities might be employed to represent the between-subjects
independent variable, and the three time periods would represent a within-subjects independent variable. I f the
study was conceptualized within the framework o f a mixed factorial design, each block (i.e., row in Table 26.9)
would be comprised o f two or more cities. For example, each row might be comprised o f two or more cities
with populations that fell within a specified range. Thus, Row 1/Block 1 might be comprised of five cities with
a population greater than five million; Row 2/Block 2 by five cities with a population between one and five
million; Row 3/Block 3 by five cities with a population between 500,000 and one million, and so on. Such a
design is typically evaluated with the factorial analysis of variance for a mixed design (Test 27i), which is
discussed in Section I X (the Addendum) of the between-subjects factorial analysis of variance (Test 27); c)
Further discussion o f analysis o f time series designs can be found in Section I X (the Addendum) o f the
Pearson product-moment correlation coefficient (Test 28).
1 .a) A main effect refers to the effect of one independent variable on the dependent variable, while ignoring the
effect any o f the other independent variables may have on the dependent variable; b) The model for the
between-subjects factorial analysis of variance can be summarized by Equation 27.104, which is a linear (or
additive) function o f the relevant parameters involved in the analysis o f variance. The latter equation
describes the elements which contribute to the score o f any subject on the dependent variable.
2.Although it is possible to conduct a directional analysis, such an analysis w i l l not be described with respect to a
factorial analysis o f variance. A discussion o f a directional analysis when an independent variable is comprised
of two levels can be found under the f test for two independent samples. In addition, a discussion of one-
tailed F values can be found in Section V I of the latter test under the discussion of Hartleya€™s F m a xtest for
homogeneity of variance/F test for two population variances. A discussion o f the evaluation o f a directional
alternative hypothesis when there are two or more groups can be found in Section V I I o f the chi-square
goodness-of-fit test (Test 8). Although the latter discussion is in reference to analysis o f a A: independent
samples design involving categorical data, the general principles regarding the analysis o f a directional
alternative hypothesis are applicable to the analysis o f variance.
3. The notational system employed for the factorial analysis of variance procedures described in this chapter is
based on Keppel (1991).
4. The value SS WG = 12 can also be computed employing the following equation:
5. This averaging protocol only applies when there is an equal number of subjects in the groups represented in the
specific row or column for which an average is computed.
6. a) I f the factor represented on the abscissa is comprised of two levels (as is the case in Figure 27.1a), when no
interaction is present the lines representing the different levels o f the second factor w i l l be parallel to one
another by virtue o f being equidistant from one another. When the abscissa factor is comprised o f more than
two factors, the lines can be equidistant but not parallel when no interaction is present; b) A distinction is often
made between an ordinal versus a disordinal interaction. In an ordinal interaction the direction o f the effect
of one independent variable is consistent across the levels of a second independent variable a€" in other words,
as is the case for the two lines in Figure 27.1b, the means for each of the subgroups comprising one line are
consistently higher than/above the means for the subgroups comprising the other line. I n a disordinal
interaction the direction o f the effect o f one independent variable is reversed across the levels o f another
independent variable a€" in other words, as is the case with the lines for levels B and B in Figure 27.1a, the
1 2

lines in the graph at some point cross one another.


7. As noted earlier, the fact that the lines are parallel to one another is not a requirement i f no interaction is present
when the abscissa factor is comprised o f three or more levels.
8.If no interaction is present, such comparisons should yield results which are consistent with those obtained when
the means o f the levels o f that factor are contrasted.
10. Many researchers would elect to employ a comparison procedure which is less conservative than the ScheffA©
test, and thus would not require as large a value as CDS in order to reject the null hypothesis.
11. The number of pairwise comparisons is [k(k a€" 1)]/2 = [6(6 a€" 1)]/2 = 15, where k = pq = (2)(3) = 6
represents the number o f groups.
12.If Tukeya€™s HSD test is employed to contrast pairs or sets of marginal means for Factors A and B, the
values q A f ) and q( f
( d WG B> d are, respectively, employed from Table A13. The sample sizes used in Equation
WG)

27.45 for Factors A and B are, respectively, nq and np.


13. When there are only two levels involved in analyzing the simple effects o f a factor (as is the case with Factor
A ) , the procedure to be described in this section w i l l yield an F value for a simple effect which is equivalent to
the Fcomp value that can be computed by comparing the two groups employing the linear contrast procedure
described earlier (i.e., the procedure for which Equation 27.40 is employed to compute SS ). comp

14. Equation 27.105 (which is equivalent to Equation 21.3 employed for computing the between-groups sum of
squares for the single-factor between-subjects analysis of variance) is employed to compute the sum o f
squares for each o f the simple effects o f Factor A.
15.In the case of the simple effects of Factor A , the modified degrees of freedom value is df = p(n a€" 1).
wG

16. a) The fact that in the example under discussion the tabled critical values employed for evaluating F arem a x

extremely large is due to the small value o f n. However, under the discussion o f homogeneity o f variance under
the single-factor between-subjects analysis of variance, it is noted that Keppel (1991) suggests employing a
more conservative test any time the value of F m a x a99o¥ 3; b) As noted in Section V I of the single-factor
between-subjects analysis of variance, alternative tests of homogeneity of variance, such as the Levene test
for homogeneity of variance (Test 21g) and the Brown-Forsythe test of homogeneity of variance (Test
21h) provide for a more powerful test of the alternative hypothesis and are less sensitive to violations of the
normality assumption underlying the between-subjects factorial analysis of variance.
17. The procedure described in this section assumes there is an equal number o f subjects in each group. I f the latter
is true, it is also the case for Example 27.1 that I % =G + I% )/2 and I % = ( I %
A2 G +I%
A B 1 1 +I%
A B 1 2 + A B 1 3

18. a) Different but equivalent forms of Equations 27.51, 27.52, 27.53 were employed to compute standard
omega squared in the first edition of this book; b) Equation 27.106 is a general equation which can also be
employed to compute the values for standard omega squared. Note that in Equation 27.106 (as well as in
Equations 27.107 and 27.108), the subscript effect refers to the main effect on Factor A , Factor B, or the
interaction. Employing Equation 27.106, the values I99osA2=.50, I99osB2=.30, and I99osAB2=.08 obtained
previously are computed below.
19. For a clarification of the use of multiple summation signs, the reader should review Endnote 63 under the
single-factor between-subjects analysis of variance and Endnote 19 under the single-factor within-subjects
analysis of variance.
20. The notation X is a simpler form of the notation X ,
ijk which is more consistent with the notational format
iABJk

used throughout the discussion o f the between-subjects factorial analysis of variance.


21.The notation a"'k=1qa"'j=1pa"'i=1nXijk is an alternative way of writing I£X . a"'k=1qa"'j=1pa"'i=1nXijk
r

indicates that the scores of each of the n = n


ABjk subjects in each of thepq groups are summed.
22.Since the interaction sum o f squares is comprised o f whatever remains o f between-groups variability after the
contributions o f the main effects for Factor A and Factor B have been removed, Equation 27.67 can be derived
from the equation noted below which subtracts Equations 27.65 and 27.66 from Equation 27.64.
23. The computation of the harmonic mean is described in the Introduction and in Section V I of the f test for two
independent samples.
24. Note that in the example to illustrate the randomized-blocks design, the letter k is employed to represent the
number of levels on Factor A and j the number of levels on Factor B, whereas in Example 27.1, the letter j is
employed as a subscript with respect to the levels o f Factor A and the letter k as a subscript with respect to the
levels o f Factor B.
25.Some sources note that the subjects employed in such an experiment (or for that matter any experiment
involving independent samples) are nested within the level o f the factor to which they are assigned, since each
subject serves under only one level o f that factor.
26.Sources commonly employ the term aliasing to represent confounding within the context of screening designs.
27. a) A fractional-factorial design should be balanced, which means there should be an equal number o f subjects
in each o f the treatment combinations employed in the design. Thus, in a study involving 256 subjects there
could be 256 o f the 1024 possible treatment combinations with one subject per combination (although
realistically a researcher would employ more than one subject per combination), or 128 treatment combinations
with two subjects per combination, and so on; b) Although in an experiment involving a large number o f
factors it is possible to have subjects serve in multiple conditions (see the factorial analysis of variance for a
mixed design and the within-subjects factorial analysis of variance in Section I X (the Addendum)), as the
number o f factors increase it becomes increasingly difficult and/or impractical to employ the same subjects in
multiple conditions.
28. The mixed factorial design is often referred to as a split-plot design.
29. The computational procedure for the factorial analysis of variance for a mixed design assumes there is an
equal number o f subjects in each o f the levels o f the between-subjects factor. When the latter is not true,
adjusted equations should be employed which can be found in books which describe the factorial analysis of
variance for a mixed design in greater detail.
30. a) The term Latin square was first used by the Swiss mathematician Lennard Euler (1707a€"1783; b) A Latin
square design is categorized in some sources as a screening design (which is discussed in Section VII), since it
does not allow interactions to be evaluated.
31. The letters employed in the English language alphabet are more formally referred to as Latin letters, since they
are derived from the ancient language o f Latin.
32. Many sources recommend that once selecting a Latin square to employ in a study, the researcher should do the
following in the order noted: a) Randomly rearrange the ordinal position of the rows; b) Randomly rearrange
the ordinal position o f the columns; c) Randomly rearrange the notation for the experimental treatments.
33. Equation 27.100 expresses relative efficiency/power efficiency in a different format than it is expressed with
Equation 6.7.
34. There are 12 possible presentation orders involving combinations of the two factors (p!q! = 3!2! = 12). The
sequences for presentation o f the levels o f both factors are determined in the following manner: I f A is 1

followed by A , presentation of the levels of Factor B can be in the six following sequences: 123, 132, 213,
2

231, 312, 321. I f A is followed by A , presentation o f the levels o f Factor B can be in the same six sequences
2 1

noted previously. Thus, there are a total o f 12 possible sequence combinations. Since there are only six subjects
in Example 27.9, only six o f the 12 possible sequence combinations can be employed.
35.If Factors A and B are both within-subjects factors and a significant effect is present for the main effects and
the interaction, the within-subjects factorial analysis of variance would be the most likely o f the three
factorial analysis o f variance procedures discussed to yield significant F ratios. The F , F , and F
A B ABvalues
obtained in Examples 27.1 and 27.4 are significant at both the .05 and .01 levels when the data are respectively
evaluated with a between-subjects factorial analysis of variance and a factorial analysis of variance for a
mixed design. However, when Example 27.9 is evaluated with the within-subjects factorial analysis of
variance, although F is significant at both the .05 and .01 levels, F and F are only significant at the .05
B A AB

level. This latter result can be attributed to the fact that the data set employed for the three examples is
hypothetical, and is not based on the scores o f actual subjects who were evaluated within the framework o f a
within-subjects factorial design. I n point o f fact, in the case o f the within-subjects factorial analysis of
variance, the lower value for df employed for a specific effect (in contrast to the values o f df employed
den den

for the between-subjects factorial analysis of variance and the factorial analysis of variance for a mixed
design) will be associated with a tabled critical F value that is larger than the values employed for the latter
two tests. Thus, unless there is an actual correlation between subjectsa€™ scores under different conditions
(which should be the case i f a variable is measured within-subjects), the loss o f degrees o f freedom will nullify
the increase in power associated with the within-subjects factorial analysis of variance (assuming the data
are derived from the appropriate design). The superior power o f the within-subjects factorial analysis of
variance derives from the smaller MS error terms employed in evaluating the main effects and interaction.
1 .It is also possible to designate the Y variable as the predictor variable and the X variable as the criterion variable.
The use of the Y variable as the predictor variable is discussed in Section V I .
2. a) I t should be noted that when the joint distribution o f two variables is bivariate normal, only a linear
relationship can exist between the variables. As a result o f the latter, whenever the population correlation
between two bivariate normally distributed variables equals zero, one can conclude that the variables are
statistically independent of one another. Under such conditions the null hypothesis H : !• = 0, commonly
0

evaluated for the Pearson product-moment correlation coefficient, is equivalent to the null hypothesis that
the two variables are independent of one another. On the other hand, it is possible for each of two variables to
be normally distributed, yet the joint distribution o f the two variables not be bivariate normal. When the latter
is true, it is possible to compute the value r = 0, and at the same time have two variables which are statistically
dependent upon one another. Statistical dependence in such a case will be the result o f the fact that the
variables are curvilinearly related to one another; b) Further discussion o f the bivariate normal distribution
(which is a special case o f a multivariate normal distribution) can be found in Section I of the chapter on
multiple regression.
3. a) Howell (2002) and Grissom and K i m (2005, pp. 71a€"72) note that the value of r computed with Equation
28.1 is a biased estimate of the underlying population parameter !• . More specifically, Grissom and K i m
(2005, p. 70) note that r and r (the point-biserial correlation), which are discussed in Section I X (the
pb

Addendum), are negatively biased a€" i.e., they tend to underestimate the underlying population correlation.
The degree to which the computed value o f r is biased is inversely related to the size o f the sample employed
in computing the correlation coefficient. For this reason, when one employs correlational data within the
framework o f research, it is always recommended that a reasonably large sample size be employed.
2
5. a) The value (1 a€" r ) is often referred to as the coefficient of nondetermination, since it represents the
proportion o f variance that the two variables do not hold in common with one another. Further discussion o f
the coefficient of determination can be found in Section I of Test 43 on meta-analysis; b) Ozer (1985) argues
that under certain conditions it is more prudent to employ |r| as a measure o f the proportion o f variance on one
variable which can be accounted for by variability on the other variable. Cohen (1988, p. 533) succinctly
summarizes Ozera€™s (1985) point by noting that when there is reason to believe a causal relationship exists
2
between X and Y, the value r provides an appropriate estimate o f the percentage/proportion o f variance on Y
attributable to X. However, i f there is reason to believe that both X and Y are caused by a third variable, the
absolute value o f r is a more appropriate measure to employ to represent the proportion o f shared variance
2
between X and Y. In practice, however, most sources always employ the value o f r to represent the
percentage/proportion o f variance on one variable attributable to the other variable. Further critique o f the use
2
of r as a basis for explaining variability can be found in Beatty (2002), Grissom and Kim (2005, pp. 91a€"95),
and Hunter and Schmidt (2004, pp. 189a€"191).
6. As noted earlier, for illustrative purposes, the sample size employed for Example 28.1 is very small.
2
Consequently, the values r and r are, in all likelihood, not accurate estimates o f the corresponding underlying
2
population parameters !• and !• .
7. a) A n equation that is based on the minimum squared distance o f all the points from the line reflects the fact that
i f the distance o f each data point from the line is measured, and the resulting value is squared, the sum o f the
squared values for the n data points is the lowest possible value that can be obtained for that set o f data; b) The
accuracy o f the method of least squares can be compromised by the presence o f outliers and violation o f the
assumption o f homoscedasticity. Alternative methods o f regression analysis that reduce the impact o f such
factors are referred to as robust regression analysis.
8. van Belle (2002, pp. 70a€"74) notes that the range of the values (as well as the spacing of values) employed on
the predictor variable is a more critical factor in affecting the precision o f a regression analysis than the value
of n (i.e., the number of pairs of observations).
9. The values s , and s can also be computed with the equations noted below:
r X X Y

10. The reader may find it useful to review the discussion of confidence intervals in Section V I of the single-
sample f test before reading this section.
11. The term SS in Equation 28.14 may also be written in the form SSX=(na"'1)sEoeX2, and the term SS in
X Y

Equation 28.15 may also be written in the form SSY=(na"'1 )sEoeY2.


12. a) Zar (1999, p. 382) notes that the following equation can be employed to convert a z value into an r value: r
r

2z 2z
= (e a€" 1 )/(e + 1). Thus, i f z = 1.886 then:
r r r

13. Equation 28.20 can also be written in the form: z=(zra"'zIn0)na"'3. Thus, z=(1.886a"'1.099)5a"'3=1.11.
14. The value n = 5 employed in Example 28.1 is not used, since the method to be described is recommended when
n a99o¥ 25. For smaller sample sizes, tables in Cohen (1977, 1988) derived by David (1938) can be employed.
15. van Belle (2002, pp. 31a€"33; pp. 59a€"61) notes that Equation 28.82 (which is identical to Equation 2.15) can
be employed to estimate the sample size required in order to detect a specific value for a population correlation
(assuming a nondirectional analysis is conducted in evaluating the null hypothesis H : !• = 0, with I± = .05).
0

The value of k in the numerator of the latter equation w i l l depend on the desired power of the test, and for the
power values .50, .80, .90, .95, and .975 the values to employ for k are, respectively, 4, 8, 11, 13, and 16. The
value d in Equation 28.82 represents the Fisher transformed value z computed with Equation 28.18. In the
r

latter equation the value specified for the population correlation in the alternative hypothesis (which for
purposes of demonstration will be H : !• = .40) is employed to represent the value of r.
1

16. Equation 28.24 can also be employed to evaluate the hypothesis of whether there is a significant difference
between k = 2 independent correlations a€" i.e., the same hypothesis evaluated with Equation 28.22. When k =
2, the result obtained with Equation 28.24 w i l l be equivalent to the result obtained with Equation 28.22.
2
Specifically, the square of the obtained value of z obtained with Equation 28.22 w i l l equal the value of I J
obtained with Equation 28.24. Thus, i f the data employed in Equation 28.22 are employed in Equation 28.24,
2 2 2
the obtained value of chi-square equals I J = z = (.878) = .771.
17. a) When k = 2, Equation 28.25 is equivalent to Equation 28.23; b) I f the sign of a correlation coefficient is
negative, the absolute value o f r is employed in obtaining the Fisher transformed z value from Table A17. A
negative sign is then attached to the z value for its use in Equation 28.25; c) Among others, Hunter and
Schmidt (1987; 2004, pp. 82a€"83) and Hunter, Schmidt, and Coggin (1996) contend that Equation 28.25 tends
to produce an estimate of the population correlation which is positively/upwardly biased (i.e., it overestimates
the population correlation). They note that although utilization of Fishera€™s z transformation compensates
for a slight negative bias (i.e., underestimation of the population correlation) associated with a simple
averaging o f the correlations, it in fact introduces a small positive bias, which tends to increase as the degree o f
variation among the correlations employed increases. Hunter and Schmidt (2004, pp. 81a€"82) suggest the use
of Equation 43.66, which is a weighted average, in place of Equation 28.25. (Equation 43.66 yields a weighted
average correlation of .855 for the example under discussion: rA =[ (5)(.955)+(5)(.765)+(5)( .845 ]/[ 5+5+5
]=.855 Schulze (2004, p. 65), however, contends that Equation 43.66 is negatively biased; d) Schulze (2004,
pp. 22a€"28; p. 193) provides further critique of Fishera€™s z transformation and describes alternative
methods for estimating the distribution o f a population correlation (most notably one proposed by Hotelling
(1953, p. 224)).
18.If homogeneity o f variance is assumed for the two samples, a pooled error variance can be computed as
follows:
19. Marascuilo and Serlin (1988) describe how the procedure described in this section can be extended to the
evaluation o f a hypothesis contrasting three or more regression coefficients.
20. The equations for z and z are analogous to Equation 1.38 in the Introduction (which employs the population
Xi Yi

parameters I % and I / in computing a z score).


21. The sum o f products within this context is not the same as the sum o f products that represents the numerator o f
Equation 28.1. The product of each subjecta€™s z and z score represents a cross-product for that
Xi Yi

subjecta€™s z scores, and the sum o f the products of the n subjects z scores is the sum of the cross-products of
the z scores. Consequently, the correlation coefficient is sometimes defined as the mean o f the cross-products
o f the z scores.
22. a) One way o f avoiding the problem o f dependent pairs is to form pairs in which no digit is used for more than
one pair. In other words, the first two digits in the series represent the X and Y variables for first pair, the third
and fourth digits in the series represent the X and Y variables for second pair, and so on. Although use o f the
latter methodology really does not conform to the definition o f autocorrelation, i f it is employed one can justify
employing the critical values in Table A16; b) The use of Table A16 earlier in this section to evaluate the
obtained correlation of r = .909 between the scores on the X and Y variables in Table 28.6 was justifiable. In
the latter situation it was assumed that the X and Y variables represented two different variables. On the other
hand, the data in Table 28.11 (although identical to that in Table 28.6) represent one set of scores in which
each score is paired with the score which follows it.
23. A discussion of the derivation of Equation 28.60 can be found in Bennett and Franklin (1954).
24. The reader should take note of the fact that the data for Example 28.6 are fictitious, and, in reality, the result of
the analysis in this section may not be consistent with actual studies that have been conducted which evaluate
the relationship between intelligence and eye-hand coordination.
25. Although the phi coefficient is described in the book as a measure of association for the chi-square test for r
A — c tables (specifically, for 2 A — 2 tables), it is also employed in psychological testing as a measure of
association for 2 A — 2 tables in order to evaluate the consistency of n subjectsa€™ responses to two
questions. The latter type o f analysis is essentially a dependent samples analysis for a 2 A — 2 table, which, in
fact, is the general model for which the McNemar test (Test 20) is employed.
26. Newbold and Bos (1990, p. 163) note, however, that some sources state use of an I± value between 1 and 2 can
yield stable estimates o f current level even though employment o f such values for I± run counter to the
rationale underlying the use o f the exponential smoothing.
27.One alternative that is sometimes employed is to set the value forecast for time period 1 to the average o f the
actual values of the dependent variable in the first few (e.g., 4 or 5) time periods. Newbold and Bos (1990, pp.
194a€"196) discuss other alternative methods for selecting an initial forecast value.
1.It should be noted that although the scores o f subjects in Example 29.1 are ratio data, in most instances when
Spearmana€™s rank-order correlation coefficient is employed it is more likely the original data for both
variables are in a rank-order format. As is noted in Section I , conversion of ratio data to a rank-order format
(which is done in Section I V with respect to Example 29.1) is most likely to occur when a researcher has
reason to believe that one or more o f the assumptions underlying the Pearson product-moment correlation
coefficient are saliently violated. Example 29.2 in Section V I represents a study involving two variables that
are originally in a rank-order format for which Spearmana€™s rho is computed.
2.Some sources employ the following statements as the null hypothesis and the nondirectional alternative
hypothesis for Spearmana€™s rank-order correlation coefficient: Null hypothesis: H : Variables X and Y
0

are independent of one another; Nondirectional alternative hypothesis: H : Variables X and Y are not
1

independent o f one another.


3. Daniel (1990) notes that the computed value of r is not an unbiased estimate of !• .
S S

4. The reader may find slight discrepancies in the critical values listed for Spearmana€™s rho in the tables
published in different books. The differences are due to the fact that separate tables derived by Olds (1938,
1949) and Zar (1972), which are not identical, are employed in different sources. Howell (2002) notes that the
tabled critical values noted in various sources are approximations and not exact values. Ramsey (1989) and
Franklin (1996) have derived critical values which they claim are more accurate than those listed in Table
A18.
5. The minimum sample size for which Equation 29.3 is recommended varies depending upon which source one
consults. Some sources recommend the use o f Equation 29.3 for values as low as n = 25, whereas others state
that n should equal at least 100.
6. The results obtained through use of Table A18, Equation 29.2, and Equation 29.3 w i l l not always be in total
agreement with one another. In instances where the different methods for evaluating significance do not agree,
there will usually not be a major discrepancy among them. I n the final analysis, the larger the sample size the
more likely it is that the methods will be consistent with one another.
7. The following will always be true when Equation 28.1 is employed in computing Pearson r (and rS) when the
2 2
rank-orders are employed to represent the scores on the X and Y variables: I£X = I£Y and I£X = I£Y (however,
the latter w i l l only be true i f there are no ties).
8. The relationship between Spearmana€™s rank-order correlation coefficient and Kendalla€™s coefficient of
concordance is discussed in greater detail in Section V I I of the latter test. In the latter discussion, it is noted
that although, when there are two sets o f ranks, the values computed for Spearmana€™s rho and
Kendalla€™s coefficient of concordance will not be identical, one value can be converted into the other
through use of Equation 31.7.
9.If the tie correction for the Friedman two-way analysis of variance by ranks is employed, the computed value
of IJr2 will be slightly higher.
th th
10.The tabled critical two-tailed .05 and .01 chi-square values represent the chi-square values at the 95 and 99
percentiles, and the tabled critical one-tailed .05 and .01 chi-square values represent the chi-square values at
th th
the 90 and 98 percentiles.
11 .In the discussion of the Friedman two-way analysis of variance by ranks, it is assumed that a nondirectional
analysis is always conducted for the latter test. A directional/one-tailed analysis is used here in order to employ
probability values which are comparable to the one-tailed values employed in evaluating Spearmana€™s rho.
Within the Friedman test model, when k = 10, the usage o f the term one-tailed analysis is really not
meaningful. For a clarification of this issue (i.e., conducting a directional analysis when k > 3), the reader
should read the discussion on the directionality of the chi-square goodness-of-fit test (Test 8) in Section V I I
o f the latter test (which can be generalized to the Friedman test).
1 .A discussion of monotonic relationships can be found in Section I of Spearmana€™s rank-order correlation
coefficient.
2. The exception to this is that when the computed value of I„Eoe is either +1 or a€"1, the identical value will be
computed for rS.
3. The coefficient of determination is discussed in Section V of the Pearson product-moment correlation
coefficient.
4. a) Some sources employ the following statements as the null hypothesis and the nondirectional alternative
hypothesis for Kendalla€™s tau: Null hypothesis: H : Variables X and Y are independent of one another;
0

Nondirectional alternative hypothesis: H : Variables X and Y are not independent o f one another.
1

5.If either of the two values I„Eoe or S is known, Equation 30.2 can be employed to compute the other value. Some
sources only list critical values for one of the two values I„Eoe or S.
6. The following should be noted with respect to Equations 30.5 and 30.6: a) The denominator of Equation 30.5 is
the standard deviation o f the sampling distribution o f the normal approximation o f tau; and b) Based on a
recommendation by Kendall (1970), Marascuilo and McSweeney (1977) (who employ Equation 30.6) describe
the use o f a correction for continuity for the normal approximation. I n employing the correction for continuity
with Equation 30.6, when S is a positive number, the value 1 is subtracted from S, and when S is a negative
number, the value 1 is added to S. The correction for continuity (which is not employed by most sources)
reduces the absolute value o f z, thus resulting in a more conservative test. The rationale for employing a
correction for continuity for a normal approximation o f a sampling distribution is discussed in Section V I o f
the Wilcoxon signed-ranks test.
7. Howell (2002) notes that the value I„Eoe=.60 indicates that i f a pair o f subjects is randomly selected, the
likelihood the pair will be ranked in the same order is .60 higher than the likelihood they w i l l be ranked in the
reverse order.
8. The data for Examples 30.1 and 29.2 are identical, except for the fact that in the latter example there is a tie for
the X score in the second ordinal position which involves the X scores in the eighth and ninth rows.
9.If Equation 30.1 is employed to compute the value of I„Eoe for the data in Table 30.4, the value I„Eoe=.578 is
computed. Note that the tie correction yields a value for tau (I„Eoec=.598) which is slightly larger than the
uncorrected value. As noted in the text, because of the presence of ties, n + n a99o [n(n a€" 1)]/2.
C D

1 Siegel and Castellan (1988) emphasize the fact that a correlation equal to or close to 1 does not in itself indicate
that the rankings are correct. A high correlation only indicates that there is agreement among the m sets of
ranks. It is entirely possible that there can be complete agreement among two or more sets o f ranks, but that all
o f the rankings are, in fact, incorrect. In other words, the ranks may not reflect what is actually true with regard
to the subjects/objects that are evaluated. Another way o f stating the above is that although there may be
interjudge reliability (i.e., agreement between the rankings o f the judges), the latter does not necessarily mean
that their rankings are valid (i.e., the concept of validity refers to whether the rankings are correct).
2 I n point o f fact, i f the values o f rS and WEo are computed for m = 2 sets o f ranks, when the computed values for
r are, respectively, 1, a"'1, and 0, the computed values of WEoe w i l l be, respectively, 1, 0, and .5. The latter
S

sets of values can be obtained through use of Equation 31.7, which is presented in Section V I I .
3 Some sources state that the alternative hypothesis is directional, since WEo can only be a positive value.
Related to this is the fact that only the upper tail o f the chi-square distribution (which is discussed in Section V )
is employed in approximating the exact sampling distribution of W E o . I n the final analysis, it becomes
academic whether one elects to identify the alternative hypothesis as directional or nondirectional.
4 The tie-corrected version of Equation 31.3 is noted below:
5 Note that for m = 3 and n = 3, no tabled critical values are listed in Table A20. This is the case, since critical
values cannot be computed for values of m and n which fall below specific minimum values. I f Equation 31.5
2 2
is employed to evaluate WEoe=.111, it yields the following result: I J = (3)(3 a€" = .666. Since I J =
.666 is less than the tabled critical two-tailed value (for df = 2) IJ.052=5.99, the obtained value WEoe=.111 is
not significant. I n point of fact, even i f the maximum possible value WEoe=1 is substituted in Equation 31.5, it
2
yields the value I J = 6, which is barely above IJ.052=5.99. Since the chi-square distribution provides an
approximation o f the exact sampling distribution, in this instance it would appear that the tabled value
IJ.052=5.99 is a little too high and, in actuality, is associated with a Type I error rate which is slightly above
.05.
6 The summary o f the data for Example 29.2 in Table 29.6 provides the necessary values required to compute the
value of WEo. The latter values are not computed in Table 29.5, which (employing a different format) also
summarizes the data for Example 29.2.
7 Although there is one set o f ties in the data, the tie correction described in Section V I is not employed for
Example 31.3.
8 The exact value IJr2=11.08 is computed i f the value WEo=.9236 (which carries the computation of WEo to
four decimal places) is employed in Equations 31.8/31.5.
1The general model for an r A — c contingency table (which is summarized in Table 16.1) is discussed in Section
I of the chi-square test for r A — c tables (Test 16).
2Gamma can also be computed i f the ordering is reversed a€" i.e., within both variables, the first row/column
represents the category with the highest magnitude, and the last row/column represents the category with the
lowest magnitude.
3Some sources employ the following statements as the null hypothesis and the nondirectional alternative
hypothesis for Goodman and Kruskala€™s gamma: Null hypothesis: H : Variables X and Y are
0

independent o f one another; Nondirectional alternative hypothesis: H1: Variables X and Y are not
independent o f one another.
3
5Sources which discuss the evaluation of the null hypothesis H : I = 0 note that the normal approximation
0

computed with Equations 32.2/32.3 tends to be overly conservative. Consequently, the likelihood o f
committing a Type I error (i.e., rejecting H when it is true) is actually less than the value of alpha employed in
0

the analysis.
6It could be argued that it might be more appropriate to employ Somersa€™ delta (which is briefly discussed in
Section V I I ) rather than gamma as a measure o f association for Example 32.3. The use o f delta could be
justified, i f within the framework o f a study the number o f years o f therapy represents an independent variable
and the amount o f change represents the dependent variable. In point o f fact, depending upon how one
conceptualizes the relationship between the two variables, one could also argue for the use o f delta as a
measure of association for Example 32.1. In the final analysis, it w i l l not always be clear whether it is more
appropriate to employ gamma or delta as a measure o f association for an ordered contingency table.
1The use o f the term continuous variable refers to a variable that can be represented on an interval/ratio scale o f
measurement.
2Although the mathematical definition of an eigenvalue is complex, Field (2005, pp. 197a€"198) provides a good
illustration o f what the latter concept represents. In the latter discussion he notes that an eigenvalue is directly
related to an eigenvector, which is a linear representation of some dimension in a multidimensional space a€"
e.g., the height, length, or width of a three-dimensional space. A n eigenvalue is simply a numerical expression
o f the dimension o f an eigenvector. The nature o f the distribution o f the variances in a data matrix can be
understood by examining the interrelationships between the eigenvalues computed for a matrix.
1a Some sources limit the use of the terms independent and dependent variable to the variables employed in a well
controlled experimental study a€" in other words, a true experiment (an experiment involving a manipulated
independent variable), which is discussed in the Introduction. However, many sources, as well as computer
software (e.g., SPSS), employ the latter terms in reference to correlational research a€" specifically, they use
the terms independent variable and predictor variable interchangeably as well as the terms dependent variable
and criterion variable interchangeably; b) Jacob Cohen (1968) demonstrated that all univariate parametric
methods (e.g., t tests, analysis o f variance) are, in fact, special cases of multiple regression. Cohen and Cohen
(1983), Licht (1995, p. 20), and Tatsuoka (1975) note that in developing the analysis of variance (e.g., the
single-factor between-subjects analysis of variance (Test 21)) Ronald Fisher originally employed multiple
regression. Ultimately, however, he used the more familiar computational procedures described for the
analysis of variance rather than multiple regression, because o f the complexity o f the computations required
for the latter type of analysis. The point to be made is that the analysis of variance can be conceptualized as a
special case of multiple regression analysis. Among others, Field (2005, pp. 311a€"316) provides a good
discussion of this subject. The relationship between multiple regression and the analysis of variance is also
discussed in Section V I I of the single-factor between-subjects analysis of variance and in Endnote 2 in the
chapter on canonical correlation under the discussion of the general linear modela€"the latter model stating
that parametric statistical procedures can be summarized in the form o f one or more linear equations, and that
in the final analysis a correlational format can be employed to summarize all such procedures (or, to put it
another way, all parametric procedures can be conceptualized as special cases o f regression analysis).
2a As noted in Section I X (the Addendum) o f the Pearson product-moment correlation coefficient, a dummy
variable is a dichotomous variable which employs two values (typically 0 and 1) to indicate membership in
one of two categories. A n excellent discussion on the use of dummy variables in multiple regression can be
found in Field (2005, pp. 208a€"210). Endnote 4 in the chapter on logistic regression also discusses relevant
information regarding the use of dummy variables; b) The reader should take note of the fact that a multiple
regression analysis should employ the optimal predictor variables. The term specification error refers to
omission o f one or more relevant predictor variables.
3The null and alternative hypotheses are sometimes expressed in an alternative format a€" specifically, employing
the squared population multiple correlation coefficient (which as noted in Section I V / V represents the
population coefficient of multiple determination) in lieu o f the population multiple correlation coefficient.
2 2
Thus: H : P = 0 versus H : P a99o 0.
0 1

4a It is also the case that the greater the number o f predictor variables in a set o f data involving a fixed number o f
2
subjects, the larger the value of R ; b) Among others, Tabachnick and Fidell (1996, p. 12) note that overfitting
may occur when there are too many variables relative to the size of the sample. Overfitting is when the
information derived from a sample is tailored to the idiosyncrasies o f the sample, yet will not provide a good fit
for the target population to which the researcher wishes to generalize.
5This principle has obviously not been adhered to in the example under discussion in order to minimize
computations.
6Tabachnick and Fidell (1996, pp. 164a€"165) provide equations developed by Browne (1975) and Cattin (1980)
for small sample sizes that allow for an even more severe adjustment than that which results from using
Equation 33.2. Field (2005, p. 172), employing Equation 33.36, suggests a more conservative adjusted REoe2
value which can be employed to estimate how well the regression model derived from the data will predict the
scores of subjects from an entirely different sample. The fact that the value REoe2=.5968 obtained below with
Equation 33.36 is substantially lower than the values obtained with Equations 33.1 and 33.2 can be attributed
to the small sample size employed in Example 33.1.
7The reader should take note o f the fact that the values obtained with the remaining equations to be presented in
this section are based on computations which were done with the aid o f a hand calculator. I f the data for
Example 33.1 are evaluated with statistical software, one may obtain a slight discrepancy between the values
obtained with the equations described in this section and the output generated by a computer. Any differences
w i l l be minimal and o f no practical consequence, and can be attributed to rounding off error.
8Harlow (2005, p. 47) notes that the equation Ya€™ = Xa€™ + e is a succinct way of summarizing what the
multiple regression equation represents. The latter equation indicates that the value of Ya€™ is a function of
a linear combination of the X variables (Xa€™) with the addition of some prediction error (e), which represents
variability that is unrelated to any o f the predictor variables. Within the latter context, some sources may
present Equation 33.4 in the form Ya€™ = a + b X + b X + a< + b X + e.
1 1 2 2 k k

9The equation noted below is equivalent to Equation 33.8.


10a It should be noted that the regression equation is usually reported in terms o f unstandardized coefficients since
it allows direct prediction o f a Y score from an X score. Use o f standardized coefficients would require
converting a standardized Y score into a raw score value; b) A standardized regression coefficient can easily
be converted into an unstandardized coefficient by algebraically transposing the terms in Equations 33.12 and
33.13. In other words, since the generic form o f Equations 33.12/33.13 is Pi=bi[ sEoeXi/sEoeY ] , then through
algebraic transposition bi=Pi[ sEoeY/sEoeXi ] ; c) Allen (1997, p. 46) and Marascuilo and Levin (1983, p. 91)
note that i f there is one predictor variable then the value o f the standardized regression coefficient is equal to
the value o f r, since r=covXY/(sEoXsEoY)=covzXzY/(sEozXsEozY)=covzXzY/(1)(1)=covzXzY, and that
th
another way o f computing/defining a standardized regression coefficient for the i predictor variable is
PXi=covzXizY/sEoezXi=covzXizY/1=covzXizY; d) Allen (1997, p. 50) notes that it can be mathematically
demonstrated that in bivariate correlation the value o f a standardized regression coefficient (which will be
equivalent to the value of r) cannot exceed 1. I n point of fact, in multiple regression, though in theory
standardized regression coefficients can range from minus infinity to plus infinity, in practice they range
from a€"1 to+1, and values outside that range indicate a major problem with the regression analysis in question
(such as collinearity, nonnormality, etc. (Field (2005, p. 611) and Harlow (2005, personal communication)).
12Howell (2002, pp. 571a€"573) also notes some sources argue that an effective measure of importance with
respect to the predictor variables, especially when the use o f the term importance is directly related to
th
predictive power, is the squared semipartial correlation (clarified later in this section) between the i
predictor variable and the criterion variable Y (with all o f the other predictor variables partialed out), which can
t h
be computed with Equation 33.37 (where F is the F value computed for the regression coefficient for the i
bi

variable). A n F value for a regression coefficient can be obtained by squaring the t value computed with
bi bi

Equation 33.16 (or through use of an equation for computing F in Neter et al.. (1990, p. 283)).
bi

13a Once again the reader should take note o f the fact that since, in actuality, the value RY.X1X2 = .972 was
determined not to be significant, it would not be considered appropriate to evaluate the regression coefficients
with respect to significance; b) The following should be noted with respect to Equation 33.15:1) When the
sample size is small and/or the number o f subjects is not substantially larger than the number o f predictor
variables, the a€oeshrunkena€n estimate REoe2 (computed with Equation 33.2) should be employed in
Equation 33.15; 2) When there are more than two predictor variables, the multiple correlation coefficient for
the k variables is employed in the numerator of the radical of Equation 33, 15 in place of RY.X1X22. The
value rX1X22 in the denominator o f the radical o f Equation 33.15 is replaced by the squared multiple
correlation coefficient o f variable i with all o f the remaining predictor variables. Thus, i f there are three
predictor variables and s is computed, the values employed in the numerator and denominator of the radical
b1

are, respectively, RY.X1X2X32 and RX1.X2X32


14Howell (2002, p. 544) cites sources who argue that the t distribution does not provide a precise approximation
o f the underlying sampling distribution for the standard error o f estimate o f the coefficients. On the basis o f
this he states that caution should be employed in interpreting the results o f the t test.
15a The same results are obtained i f the analysis is done employing the standardized regression coefficients.
This is demonstrated below employing the appropriate equations for the standardized coefficients. The minimal
discrepancy between the values tI±1 and tb1 is due to rounding off error.
16Marascuilo and Levin (1983) and Marascuilo and Serlin (1988) recommend that in order to control the Type I
error rate, a more conservative t value should be employed when the number o f regression coefficients
evaluated is greater than one. The latter sources describe the use of the Bonferronia€"Dunn and ScheffA©
procedures (which are described in reference to multiple comparisons for analysis of variance procedures) in
adjusting the t value.
17a Since the zero-order correlation between sugar and cavities is so high (i.e., r = .955), there is, so to speak,
YX1

no room left for salt to make a significant contribution; b) Licht (1995, p. 40) and Neter et al.. (1990, p. 304)
note that when multicollinearity is present it is possible for a set o f predictor variables to be significantly
correlated with the criterion variable, yet all o f the individual tests on the regression coefficients yield
nonsignificant results. The latter result is also possible (although not likely to occur in practice) when no
multicollinearity is present.
18The reader should take note o f the fact that the significant result obtained earlier for the regression coefficient b 1

= .278 can be attributed to the fact that when sugar is employed as the only predictor variable (as is the case in
Example 28.1), the correlation between sugar and the number of cavities is, in fact, statistically significant.
Yet, paradoxically, when both sugar and salt are included in the analysis, the multiple correlation coefficient
is not significant. The latter can be attributed to the fact that the degrees of freedom in the analysis of variance
involving two predictor variables is dfnum = 2 and dfden = 2 in contrast to the degrees o f freedom for an analysis
of variance with one predictor variable, where d f = 1 and d f = 3 (see Table 28.3). Thus, when df = 1, 3 (as
n u m d e n

in Example 28.1), F = 10.13, yet when df = 2, 2 (as in Example 33.1), F = 19.00. Consequently, it is easier
0 5 0 5

to reject the null hypothesis that the correlation in the underlying population equals zero when only one
predictor is employed.
19Licht (1995, p. 37) notes that an alternative way o f conceptualizing statistical control can be demonstrated
through use of the residuals. Since a residual represents a part of a subjecta€™s score which cannot be
predicted from the predictor variables, the information contained in a residual w i l l be independent o f the
predictor variables employed in an analysis. To illustrate how residuals can be employed to demonstrate
statistical control let us assume the researcher conducts a multiple regression analysis which involves the
three predictor variables o f sugar (X1), salt (X2), and potassium (X3) consumption along with the criterion
variable o f the number o f cavities (Y). What the researcher does, however, is to initially conduct a separate
analysis in which the sugar consumption scores o f subjects are employed as a criterion variable and the salt and
potassium consumption scores o f subjects are employed as the predictor variables. The researcher then
computes residuals for sugar consumption by obtaining the difference between the predicted sugar
consumption for each subject and a subjecta€™s actual sugar consumption. The latter values (i.e., the
residuals) would represent sugar consumption after controlling for the other two predictor variables o f salt and
potassium. If, in fact, the residual values obtained for sugar consumption are then correlated with the number
o f cavities, the resulting correlation should correspond to the semipartial correlation rY(X1.X2X3) (which when
squared will reflect the proportion o f shared variance between the number o f cavities and the amount o f sugar
consumed after any linear association that X (amount o f sugar consumed) has with X (amount o f salt
1 2

consumed) and X (amount o f potassium consumed) has been removed).


3

20The computed value r can also be evaluated through use of the critical values in Table A16 (for df = n a€" v).
sp

In the latter table, for df = 2, the tabled critical two-tailed values are r = .950 and r = .990. Since both
0 5 0 1

r (
Y ) = .82 and r ( )
X 1 X 2 = .18 are less than r = .950, the nondirectional alternative hypotheses H : !• (
Y X2X1 0 5 1 Y )
X 1 X 2

a99o 0 and H : !• ( ) a99o 0 are not supported.


1 Y X2X1

21Equation 33.32 becomes identical to Equation 28.3 when n a€" v = n a€" 2.


22a The computed value r can be evaluated through use of the critical values in Table A16 (for df = n a€" v). In
p

the latter table, for df = 2, the tabled critical two-tailed values are r.05 = .950 and r.01 = .990. Since rYX1.X2 = .96
is greater than r = .9 5 0, the nondirectional alternative hypothesis H : !•
0 5 1 Y X a99o 0 is supported at the .05
1 X 2

level (but not at the .01 level). Since rYX2.X1 = .60 is less than r.05 = .950, the nondirectional alternative
hypothesis H : !•
1 Y a99o 0 is not supported; b) Howell (2002, p. 555) and Licht (1995, p.43) note that for a
X 2 X 1

given predictor variable, a partial correlation, a semipartial correlation, and a regression coefficient are
different indices o f the independent contribution that the predictor has with the criterion variable, and because
they all essentially measure the same thing, the test o f significance for a regression coefficient will yield a
comparable result to that obtained for a test o f significance on either the partial or semipartial correlation
obtained for that predictor.
23Note that when a simple bivariate/zero-order correlation is computed, n a€" v = n a€" 2, and thus Equation
33.34 becomes identical to Equation 28.3 (which is used to evaluate the significance of the zero-order
correlation coefficient rX1X2).
24The following SPSS command sequence was employed in this chapter in conducting a multiple regression
analysis: a) Click Analyze; b) Click Regression; c) Click Linear; d) Highlight the predictor variables (i.e.,
agility, strength, and intelligence in Example 33.2) and move them to the Independent(s) window; e)
Highlight the criterion variable (i.e., performance in Example 33.2) and move it to the Dependent window; f)
Click Statistics, check off desired information (e.g., Confidence intervals, Descriptives, Part and partial
correlations, Collinearity diagnostics, etc.), and then click Continue; g) Click Plots, highlight desired
graphic displays, and then click Continue; h) In the Methods window select the type of multiple regression
analysis to be conducted. The default option of Enter represents standard multiple regression analysis; i )
Click O K to obtain the output for the analysis.
25Field (2005, pp. 180a€"181; pp. 202a€"206) notes that in SPSS a regression plot of * Z R E S I D (y-axis) versus
* Z P R E D (x-axis) can be informative in evaluating the assumptions of linearity, independence of residuals, and
homoscedasticity underlying multiple regression. In point of fact, the same guidelines described in Section
V I I of the Pearson product-moment correlation coefficient (with reference to Figure 28.8) for evaluating the
assumptions underlying simple linear regression can be applied in evaluating the latter plot. More specifically,
i f none of the aforementioned assumptions are violated the scatterplot should resemble Figure 28.8a a€" i.e.,
the points should be randomly distributed above and below the horizontal line corresponding to 0. Field (2005,
p. 181) also notes that heteroscedasticity can be detected with a plot of * S R E S I D (y-axis) versus * Z P R E D (x-
axis). In addition, he discusses other plots available with SPSS that can be useful for evaluating the normality
of residuals and detection of outliers (Field (2005, pp. 204a€"206)).
26When the Studentized residual is computed as just described it is sometimes referred to as an external or
externally Studentized residual. When, however, the denominator for computation o f the standard deviation
includes the omitted observation, the Studentized residual is referred to as an internal or internally
Studentized residual.
27Berenson and Levine (1996, p. 756) note that Hoagalin and Welsch (1978) determined that for simple linear
t h
regression (i.e., one predictor variable) the hat element (h) for the i observation in a sample comprised of n
scores can be computed with Equation 33.38. Computation of the hat elements for multiple regression is more
involved and is generally implemented with the aid o f computer software.
28A comprehensive discussion o f regression diagnostics can be found in Field (2005, Ch. 5).
29The 1 at the upper left o f the table under Model will be clarified in the discussion o f Table 33.7.
30The identical values obtained for the correlation and regression coefficients for the backward stepwise
analysis in Table 33.8 are also obtained i f a standard multiple regression analysis is conducted only
employing the two predictor variables o f agility and strength.
31Citing Darlington (1990), Aiken and West (1991, pp. 92a€"93) note that in instances where predictor variables
are highly correlated with one another it is difficult to distinguish between a regression equation involving an
interaction versus a curvilinear regression function.
1As w i l l be noted in Section V I within the framework of the discussion of Test 34b: The use of the single-
sample Hotellinga€™s I" to evaluate a dependent samples design, Hotellinga€™s I" can also be employed
2 2

to evaluate a dependent samples design involving an independent variable with two or more levels and two or
more dependent variables.
2Stevens (2002, p. 193) notes it is not possible to state an alternative hypothesis directionally for a multivariate
test. In the case of Hotellinga€™s T Stevens (2002, p. 175) notes the statement in the null hypothesis that the
2

vectors of the two populations are equal a€oeimplies that the groups are equal on all p dependent variables.aC*
Consequently, i f a significant result is obtained for Hotellinga€™s T , it can be due to either of the following:
2

a) A l l of the univariate t values are significant (i.e., t tests conducted on each of thep dependent variables yield
significant results); b) One or more of the univariate t values is significant; or c) None of the univariate t values
2
is significant, but instead the significant T value is the result of one or more significant linear combinations of
the two dependent variables (Stevens (2002, pp. 184a€"186)).
2
3Marascuilo and Levin (1983, p. 282) note that when p = 1, F = T , which represents the univariate case where F
2
= t (i.e., the case of the F value computed for the single-factor between-subjects analysis of variance (Test
21) and the t value computed for the t test for two independent samples).
4a The reader should take note o f the fact that the latter equations for computing degrees o f freedom are only
applicable when there are k = 2 groups and 2 or more dependent variables. They will not be applicable when
there are three or more groups and 2 or more dependent variables; b) A more detailed description o f
Pillaia€™s trace, Wilksa€™ lambda, Hotellinga€™s trace, and Roya€™s largest root can be found in
Section I V of the multivariate analysis of variance.
5The following SPSS command sequence was employed in this chapter in conducting a Hotellinga€™s T 2

analysis: a) Click Analyze; b) Click General linear model; c) Click Multivariate; d) Highlight the
dependent variables (i.e., anxiety and depression for Example 34.1) and move them to the Dependent
Variables window; e) Highlight the independent variable (i.e., the variable indicating which group a subject is
in a€" drug versus placebo in Example 34.1) and move it to the Fixed Factor(s) window; f) Click Options,
check off desired information (e.g., Descriptive statistics, Estimates of effect size, etc.), and then click
Continue; g) Click O K to obtain the output for the analysis.
6Although it will not be necessary to consider the values printed out in the upper section of Table 34.1 labeled
Intercept, the latter information is derived through use o f a regression model to evaluate a multivariate
analysis of variance. As discussed in greater detail in Section V I I o f the single-factor between-subjects
analysis of variance under the subject o f the general linear model and in Endnote 2 in the chapter on
canonical correlation (Test 38), all parametric statistical procedures can be summarized in the form o f one or
more linear equations, and in the final analysis a correlational format can be employed to summarize all such
procedures (or, to put it another way, all parametric procedures can be conceptualized as special cases o f
regression analysis). I t is through use of the latter type of analysis that the values in the Intercept section of
Table 34.1 are derived. For further clarification regarding the multivariate analysis of variance as a special
case of regression analysis the reader should consult Stevens (2002, pp. 188a€"192).
7a SPSS evaluates the sphericity assumption with Mauchlya€™s test of sphericity (Mauchly (1940)) (which is
displayed in a separate table labeled Mauchlya€™s Test of Sphericity). I f the probability value for the latter
test (which is displayed in a column labeled Sig.) is less than .05, the sphericity assumption is violated.
Adjusted F values (based on Geisser and Greenhouse (1958) and Huynh and Feldt (1976)) (along with their
associated probabilities) are printed out which should be employed when the sphericity assumption is violated;
b) Various sources note that when the sphericity assumption underlying the single-factor within-subjects
analysis of variance is not violated, the latter test provides a more powerful test o f an alternative hypothesis
than Hotellinga€™s T /multivariate analysis of variance. However, when with a large sample size the
2

sphericity assumption is not violated, the power o f the Hotellinga€™s T /multivariate analysis of variance is
2

comparable to that o f the single-factor within-subjects analysis of variance. For further discussion o f the
relative merits and liabilities associated with Hotellinga€™s T /multivariate analysis of variance versus the
2

single-factor within-subjects analysis of variance in evaluating a repeated-measures design (i.e., dependent


samples design), the reader should refer to the discussion o f the sphericity assumption in Section V I o f the
single-factor within-subjects analysis of variance.
th
8a The same result for the analysis w i l l be obtained i f the first difference score for the i subject (D ) is obtained
i1
t h
by subtracting the i subjecta€™s score in Condition 2 (X ) from his or her score in Condition 1 (X ) a€" i.e.
i2 i1
t h t h
D = X a€" X , and the second difference score for the i subject ( D ) is obtained by subtracting the i
i1 i 1 i 2 i2

subjecta€™s score in Condition 3 (X ) from his or her score in Condition 2 (X ) a€" i.e., D = X a€" X ; b)
i3 i2 i 2 i 2 i3

A n alternative protocol which w i l l yield the same test result is to subtract each subjecta€™s score in a given
condition from his or her score in a standard condition. Thus, in the case of Example 24.1, i f Condition 1 is
designated as a standard condition, the two difference scores for each subject w i l l be D = X a€" X and D =
i 1 i1 i2 i 2

X a€" X ; c) Myers and Well (2003, p. 360) note that in order to conduct a multivariate analysis on a
i1 i 3

dependent samples design, the value of n must be greater than the value of k a€" 1. The latter, however, is not a
requirement for evaluating the data with a single-factor within-subjects analysis of variance.
h
9The notation, which is the mean of the f difference score, can also be written as .
10Some sources (e.g., Tabachnick and Fidell (1996, Ch. 10; 2001, Ch. 10)) refer to a multivariate analysis of a
dependent samples design as profile analysis.
11It should be noted that analysis o f Example 24.1 with SPSS did not indicate violation o f the sphericity
assumption. SPSS summarizes the latter analysis in a table labeled Mauchlya€™s test of sphericity, which in
the case o f Example 24.1 yielded a probability value above .05.
1 Sources are not in agreement with respect to the relationship between the magnitude o f the correlation between
the dependent variables and the power o f the multivariate analysis of variance. In discussing the latter issue,
Field (2005, p. 574), citing a study by Cole et al. (1994), notes that the power of the multivariate analysis of
variance appears to depend on a combination o f the correlation between the dependent variables and the
magnitude o f the effect size being measured.
th
2 The notation I % is employed by some sources to summarize the vector of the means of the k group noted
k
th
below, which is commonly referred to as the mean vector o f the k group.
3 Marascuilo and Levin (1983, pp. 179a€"181) state that eigenvalues (also referred to as roots) are most
commonly computed for a variance-covariance matrix, and Harlow (2005, pp. 93a€"94) notes that
eigenvalues are just a redistribution o f the original variances o f the variables. Marascuilo and Levin (1983, pp.
179a€"181) and Stevens (2002, p. 73) define the eigenvalues of Matrix A (which is always assumed to be a
square matrix) as the solution to the determinantal equation |A a€" = 0 (where I», which is the lower case
Greek letter lambda, represents the value o f an eigenvalue). In the latter equation Matrix I is an identity
matrix with the same number o f rows and columns as Matrix A. As demonstrated below, in the above
determinantal equation an eigenvalue is multiplied by the identity matrix and after the product is subtracted
from Matrix A, the determinant o f the resulting matrix must equal zero. The number o f eigenvalues derived
for Matrix A will equal the number o f rows or columns (where r = c) in Matrix A. To compute the
eigenvalues for Matrix A the determinantal equation above is solved as noted below, where a, b, c, and d
represent the four elements (i.e., four numerical values) that comprise Matrix A and I» represents the unknown
eigenvalues for which the equation is solved. Since the dimensions o f Matrix A are r = c = 2, the
determinantal equation will be reduced to a polynomial equation (in this case a quadratic equation) with two
solutions that will represent the two eigenvalues which are designated as I » and I » .
1 2

4 The following SPSS command sequence was employed in this chapter in conducting a multivariate analysis of
variance: a) Click Analyze; b) Click General linear model; c) Click Multivariate; d) Highlight the
dependent variables (i.e., anxiety and depression in Example 35.1) and move them to the Dependent
Variables window; e) Highlight the independent variable (i.e., the variable indicating which group a subject is
in a€" drug versus placebo versus no treatment in Example 35.1) and move it to the Fixed Factor(s)
window; f) Click Options, check off desired information (e.g., Descriptive statistics, Estimates of effect size,
etc.), and then click Continue; g) I f you wish to conduct comparisons between pairs of means, click Post Hoc,
highlight the independent variable in the left window (i.e., the variable indicating which group a subject is in)
and move it to the Post Hoc tests for window, and then click Continue; h) Click O K to obtain the output for
the analysis.
5 a) Although it will not be necessary to consider the values printed in the upper section o f Table 35.1 labeled
Intercept, the latter information is derived through use o f a regression model to evaluate a multivariate
analysis of variance. As discussed in greater detail in Endnote 2 in the chapter on canonical correlation (Test
38), all parametric statistical procedures can be summarized in the form o f one or more linear equations, and in
the final analysis a correlational format can be employed to summarize all such procedures (or, to put it another
way, all parametric procedures can be conceptualized as special cases o f regression analysis). It is through use
o f the latter type o f analysis that the values in the Intercept section o f Table 35.1 (as well as Table 35.2) are
derived; b) Fields (2005) represents the best user friendly reference for those who are interested in employing
SPSS to conduct statistical analysis. Although nowhere as comprehensive in their discussion o f statistical tests
as Fields (2005), George and Mallery (2001) is another good reference on SPSS. The author also recommends
Mertler and Vannatta (2005) for readers interested in using SPSS for multivariate analysis.
6 The relationship between F and I> as described by Equation 34.6 is only applicable when there are k = 2 groups.
It w i l l not result in the correct value for F i f the total sample size is substituted for n1 + n2 in the numerator o f
the latter equation along with the value computed for I " .
7 The fact that the analysis o f covariance does not provide for a more sensitive test of the null hypothesis for the
dependent variable o f depression than does a simple analysis o f variance on the depression scores is primarily
due to the fact that the error variance computed for the analysis o f covariance was not substantially different
from the error variance computed for the analysis of variance. Additionally, Stevens (2002, pp. 345a€"347)
notes that i f the covariate is measured after the treatments (as is the case in this analysis) and it is influenced by
the treatments, the change in the covariate may be correlated with the dependent variable, and under the latter
circumstances part o f the treatment effect w i l l be removed.
1The following SPSS command sequence was employed in this chapter in conducting a multivariate analysis of
covariance: a) Click Analyze; b) Click General linear model; c) Click Multivariate; d) Highlight the
dependent variables (i.e., anxiety and depression in Example 36.1) and move them to the Dependent
Variables window; e) Highlight the independent variable (i.e., the variable indicating which group a subject is
in a€" drug versus placebo versus no treatment in Example 36.1) and move it to the Fixed Factor(s)
window; f) Highlight the covariate (i.e., social chaos in Example 36.1) and move it to the Covariate(s)
window; g) Click Options, check off desired information (e.g., Descriptive statistics, Estimates of effect size,
etc.), and then click Continue; h) Click O K to obtain the output for the analysis.
2Although it will not be necessary to consider the values printed in the sections of Tables 36.2 and 36.2 labeled
Intercept, the latter information is derived through use o f a regression model to evaluate a multivariate
analysis of covariance. As discussed in greater detail in Endnote 2 in the chapter on canonical correlation
(Test 38), all parametric statistical procedures can be summarized in the form of one or more linear equations,
and in the final analysis a correlational format can be employed to summarize all such procedures (or, to put it
another way, all parametric procedures can be conceptualized as special cases o f regression analysis). It is
through use of the latter type of analysis that the values in the Intercept section of Tables 36.2 and 36.3 are
derived.
3Stevens (2002, p. 346) notes that when a researcher elects to employ more than one covariate, Huitema (1980, p.
161) recommends limiting the number of covariates such that the ratio [Number of covariates + (Number of
groups a€" 1)] / Total sample size is less than .10.
1 Silva and Stam (1995, p. 284) note that the null hypothesis evaluated with the omnibus multivariate analysis of
variance (i.e., H0: Equality o f the k composite treatment means) is equivalent to evaluating the null hypothesis
that none o f the discriminant functions makes a statistically significant contribution to group separation.
2 a) On a more technical level, Stevens (2002, p. 287) notes that the first discriminant function w i l l have the
largest eigenvalue in the W B matrix (which is the definition of Roya€™s largest root), the second
a 1

discriminant function w i l l have the second largest eigenvalue in the latter matrix, and so on. Stevens (2002, pp.
286a€"287) notes that discriminant function analysis is often referred to as a maximization procedure,
since it identifies the linear combinations which maximize the between to within association in the
aforementioned matrices; b) Silva and Stam (1995, p. 283) note that in most applications o f discriminant
function analysis a small subset o f discriminant functions will explain most o f the differences between groups.
It is not uncommon for just one or at most two discriminant functions to account for most o f the between-
groups variability in the data. Silva and Stam (1995, p. 283) also note that since in most cases (k a€" 1) is less
than p, the number o f predictor variables will usually be greater than the number o f discriminant functions.
3 a) The first value o f lambda computed in discriminant function analysis will always equal the value o f
lambda computed i f a multivariate analysis of variance is employed to evaluate the same set o f data; b) Hair
et al.. (1995, p. 198) and Tabachnick and Fidell (1996, p. 533) note that Pillaia€™s trace and Hotellinga€™s
trace can also be used to evaluate the discriminatory power o f all the discriminant functions, but that
Roya€™s largest root can only evaluate the discriminatory power o f the first discriminant function. The latter
sources also state that i f a stepwise discriminant function analysis is conducted, the Mahalanobisa€™ D 2

statistic (which is most useful when there are a large number o f predictor variables) and a measure referred to
as Raoa€™s V can also be employed to evaluate the discriminatory power of the functions.
4 Some sources conceptualize discriminant function analysis as a form o f canonical correlation (Test 38),
which in contrast to discriminant function analysis evaluates the relationship between a set of predictor
variables and a set of criterion variables. Canonical correlational analysis extracts two or more linear
combinations involving the two sets of variables which are maximally correlated with one another. Each pair
of linear combinations that is extracted can be represented by Equation 37.5. Note that the right side of the
latter equation is similar (although not identical) to the right side of Equation 37.1. Within the framework of
canonical correlational analysis, Equation 37.5 is referred to as a canonical function (or canonical root),
and each side o f the equation is labeled a canonical variable (also referred to as a canonical variate). Note
that a canonical variable is a linear combination o f a set o f predictor or criterion variables.
5 The following SPSS command sequence was employed in this chapter in conducting a discriminant function
analysis: a) Click Analyze; b) Click Classify; c) Click Discriminant; d) Highlight the criterion/grouping
variable (i.e., the variable indicating whether or not a subject had a fatal attack versus a serious nonfatal
heart attack versus a mild heart attack versus no heart attack in Example 37.2) and move it to the
Grouping Variable window. Click Define Range, enter the lowest and highest group identification numbers
in the Minimum and Maximum windows, and then click Continue; e) Highlight the predictor variables (i.e.,
the four risk factors in Example 37.2) and move them to the Independents window; f) Click Statistics, check
off desired information (e.g., Means, Univariate ANOVAs, etc.), and then click Continue; g) Click Classify,
check off desired information (e.g., Casewise results, Summary table, Leave-one-out Classification, etc.),
and then click Continue; h) Click Save, check o f f desired information (e.g., Predicted group membership,
Discriminant scores, Probabilities of group membership), and then click Continue; i ) Indicate whether you
wish to Enter independents together or conduct a Stepwise analysis; j ) Click O K to obtain the output for the
analysis.
6 a) The value I> = .292 is identical to the value computed for Wilksa€™ lambda in Table 37.1 for the omnibus
multivariate analysis of variance. In point o f fact, it will always be the case that the value computed in the
first row o f the Wilksa€™ lambda table will equal the value o f lambda computed for the omnibus
multivariate analysis of variance; b) The computed chi-square value in the Wilka€™s lambda table is
evaluated through use o f the critical values in Table A4, employing the degrees o f freedom value noted in
Table 37.2. For the row labeled 1 through 2, the critical .05 and .01 values employed for df = 6 are
2
IJ.052=12.59 and IJ.012=16.81. Since the computed value I J = 13.556 is greater than IJ.052=12.59, the result
2
is significant at the .05 level. Since the value I J = .221 computed for Function 2 is less than IJ.052=5.99 (for
df = 2), the latter result is not significant.
7 a) In Endnote 10 in the chapter on multiple regression it was noted that in multiple regression the absolute
value of a standardized regression coefficient cannot be greater than 1 (and i f it exceeds 1 it indicates a major
problem with the analysis a€" e.g., collinearity, nonnormality, etc.). I n discriminant function analysis,
however, the absolute value o f a standardized discriminant function coefficient can exceed 1 (Lisa Harlow,
personal communication); b) The author is indebted to Lisa Harlow for clarifying the limiting values which can
be assumed by coefficients in regression analysis versus discriminant function analysis.
8 a) Field (2005, p. 612) and Hair et al.. (1995, p. 206) state that canonical loadings are analogous to the factor
loadings derived in factor analysis (which is discussed later in this chapter); b) The value o f a canonical
loading must fall between a"'1 and +1, and i f a value falls outside of that range it indicates there is a problem
with the analysis (Lisa Harlow, personal communication).
9 Stevens (2002, p. 296) notes that i f the canonical loading o f a predictor variable has a negative sign, the groups
that scored highest on that variable would score lower on the function in question.
11 Silva and Stam (1995, pp. 296a€"298) provide a good discussion of alternative methods for deriving
classification rules.
12 The proportions of the full sample employed for the two subsamples can be values other than .5. For example,
the discriminant functions can be derived using .6 o f the subjects and cross-validated using the remaining .4 o f
the subjects.
13 a) Some sources employ the term U-method interchangeably with jackknifing. Hair et al.. (1995, p. 210),
however, make a distinction such that the U-method focuses on classification accuracy, while jackknifing
focuses on the stability o f the discriminant function coefficients; b) One method for assessing the utility o f a
classification matrix is the maximum chance criterion. The latter method compares the percentage o f correct
classifications in the matrix with the percentage o f subjects which comprised the largest group (in the sample
employed to derive the discriminant functions), since i f the latter percentage is large the most accurate strategy
might be to classify subjects whose group membership is unknown in the largest group; c) Tabachnick and
Fidell (1996, p. 520) note that classification procedures for discriminant function analysis are extremely
sensitive to violation o f the homogeneity o f variance-covariance matrices assumption. When the latter
assumption is violated, subjects are more likely to be classified in the group with the greatest dispersion (i.e.,
the group with the largest determinant for its within-groups covariance matrix).
14 In this instance cross-validation means employing a new sample in order to determine whether the discriminant
function(s) obtained for the original sample can be employed to accurately categorize subjects in the new
sample whose group membership is known.
1. According to Webstera€™s New Collegiate Dictionary (1981) the word canonical means a€oereduced to the
simplest or clearest schema possible.a€'
2. a) Because it employs multiple predictor variables and multiple criterion variables canonical correlation is
sometimes referred to as multiple-multiple correlation (Wuensch (2005)); b) Horst (1960) notes that although
infrequently employed, canonical correlation can be generalized to situations involving more than two
variable sets; c) Thompson (2000, p. 297) states that canonical correlation represents the most general case o f
the general linear model (discussed in Section V I I o f the single-factor between-subjects analysis of
variance (Test 21)), and cites Knappa€™s (1978, p. 410) contention that essentially all commonly employed
parametric tests o f statistical significance can be viewed as special cases o f canonical correlation (also see
Campbell and Taylor (1996)). I n order to clarify the latter, it is necessary to define the general linear model
(often identified with the acronym G L M ) , which can be summarized ( i f for the moment one is not overly
concerned with mathematical rigor) with Equation 38.5 (based on Trochim (2005); also see Howell (2002, pp.
604a€"607).
3.The data for Example 38.1 are hypothetical and for demonstration purposes only. The sample size n = 40 is
smaller than the minimum most researchers would employ for a canonical correlation.
4.In order to conduct a canonical correlational analysis with SPSS it is necessary to employ the syntax editor.
The analysis is conducted within the framework of a multivariate analysis of variance. The syntax employed
for Example 38.1 is reproduced below.
5.The author is indebted to Karl Wuensch for clarifying what the eigenvalues printed out by SPSS represent. Many
sources (e.g., Garson (2005), Hair et al.. (1995, p. 333) and Tabachnick and Fidell (1996, p. 201)) define an
eigenvalue as equivalent to the value o f the squared canonical correlation (i.e., rC2). (Tabachnick and Fidell
(1996, p. 201), however, acknowledge this discrepancy in the use o f the term eigenvalue in a footnote on page
201.)
6.Some computer software also prints out adjusted or shrunken values for the canonical correlations which
corrects for bias resulting from a small sample size. Adjusted coefficients are discussed in Section IV/V of the
chapter on multiple regression.
7.In the case o f Roya€™s largest root, SPSS only prints out the value o f the test statistic.
8.Tabachnick and Fidell (1996, p. 221) note it is theoretically possible for the first canonical function by itself to
not be significant, but rather only to achieve significance when considered in combination with the remaining
canonical functions. I n such an instance, the probability in the row 1 T O 3 will be greater than .05. The latter
authors note there is no acceptable test to evaluate each o f the canonical functions separately.
10.An excellent illustration o f the geometrical representation o f a canonical function can be found in Thompson
(2000, pp. 292a€"295), who refers to subjecta€™s scores on the canonical variates as synthetic-latent
variable scores (which reflects the fact that such a score is assumed to represent a subjecta€™s standing on
some underlying dimension).
12.Stevens (2002, pp. 476a€"477) states that i f the absolute value of the structure coefficient for a given variable
which comprises a canonical variate is high, yet its standardized canonical coefficient is low, such a variable
would be viewed as redundant. Inspection o f Table 38.4 suggests the latter logic could be applied to the Y
variable of Pull ups and the X variable of Weight. The concept of redundancy is discussed later in this section.
13. Garson (2005) notes that a unique solution for canonical correlation cannot be obtained i f variables are
redundant a€" i.e., have a perfect or near perfect correlation with one another. A correlation matrix with
redundancy is said to be singular or ill-conditioned, which falls within the general rubric o f the
multicollinearity problem.
14. a) Although in the case o f Example 38.1 the two values computed for the redundancy coefficients (.2284 and
.2278) are almost identical, Thompson (1984) and Wuensch (2005) note the redundancy coefficient is an
asymmetric index a€" i.e., the redundancy coefficient computed for one variate will rarely equal that computed
for the other; b) Thompson (2000, p. 309) notes the only time a redundancy coefficient w i l l equal 1 (which
would be highly unusual) is when a canonical variate accounts for all o f the variance on every variable which
comprises the variate (which is commensurate with all the squared structure coefficients for the variate being
equal to 1) and the squared canonical correlation for the relevant canonical function equals 1; c) Although
Miller (1975) developed a test o f significance for redundancy coefficients, there are no generally accepted
guidelines with respect to when a redundancy coefficient should be viewed as excessive.
15. a) Harlow (2005, p. 184) notes that although it is desirable for variables to have a correlation greater than .30
(once again, in the case o f a large sample) with the opposite variate, as a general rule it would be expected that
variables will be more highly correlated with their own canonical variate; b) SPSS will not print out the cross-
loadings for canonical correlation unless additional instructions are included in the syntax editor.
16. Hair et al.. (1995, pp. 334a€"335) note a statistically significant canonical correlation (even a large value) for a
canonical function does not necessarily indicate that such a function is o f practical significance. Although the
latter authors state that no generally accepted guidelines have been established with respect to a minimum
acceptable value for a redundancy coefficient, they attach a lot more importance than most sources to using
redundancy coefficients in determining whether or not a canonical function is o f practical significance (Hair et
al. (1995, p. 336)). Although the values .2278 and .2284 computed in Example 38.1 for the redundancy
coefficients might be considered low by some researchers, the information derived from the structure
coefficients suggest that Function 1 may be of practical significance.
17. Wuensch (2005) notes the value of the canonical correlation for the first canonical function w i l l always be
equal to or greater than the largest of the multiple correlations. The latter is the case for Example 38.1 where r C

= .646 is greater than the largest multiple correlation .607 (for Sit ups).
1. a) The term multinomial logistic regression is commonly employed when there are three or more categories on
the dependent variable; b) The terms multinomial and polychotomous (or polytomous) are used in
identifying a categorical independent or dependent variable which is comprised o f more than two categories.
2. a) Harlow (2005, p. 154) and Tabachnick and Fidell (1996, p. 580) note, however, that, as is the case for
standard linear regression, the reliability o f logistic regression can also be compromised by multicollinearity
a€" i.e., extremely high intercorrelations between the predictor variables; b) Some sources note that when the
dependent variable is dichotomous, standard linear regression may work fairly well in predicting a
probability when the value of a probability is not extremea€"i.e., when the probability falls between .20 and .80
(since a logistic function (e.g., Figure 39.1) is to a large degree linear at its center).
3. a) Hosmer and Lemeshow (1989; 2000, p. 5) demonstrate that i f you compute the average score on the
dependent variable for each value o f the independent variable, the latter relationship yields the same S-shaped
function. To illustrate in reference to Figure 39.1, assume that the values 0 and 1 are respectively employed as
scores on the dependent variable for subjects who do not have a stroke versus subjects who have a stroke.
Since the overwhelming majority of subjects with a systolic blood pressure under 180 will not have a stroke
(with the lower the blood pressure, the more likely someone will not have a stroke), the average score on the
dependent variable for each blood pressure value under 180 will be close to 0 (where a subjecta€™s score is 0
or 1, and the average score for a given blood pressure value can be viewed as representing the proportion o f
subjects with that blood pressure). In the same respect, since most o f the subjects with a systolic blood pressure
over 200 w i l l have a stroke (with the higher the blood pressure, the more likely someone will have a stroke),
the average score on the dependent variable for each blood pressure value above 200 will be close to 1. In the
case o f blood pressure values between 180 and 200 there w i l l be more variability among subjects with respect
to whether or not a person will have a stroke, and thus (as is the case in Figure 39.1) the average score for
pressures values in the latter range w i l l fall between .25 (for lower pressures in that range) and .75 (for higher
pressures in that range); b) Note that a logistic function, such as that depicted in Figure 39.1, closely resembles
the plot of a cumulative frequency distribution (see Figure 1.6 in the Introduction), which also conforms to
a sigmoidal curve.
4. a) The data for Example 39.1 (as well as that depicted in Figure 39.1) are hypothetical and not based on the
results o f an actual study; b) In order to employ computer software for logistic regression, it is necessary to
employ dummy coding for the dichotomous predictor/independent variables and the dichotomous dependent
variable. Dummy coding (which is also discussed in Section I X (the Addendum) of the Pearson product-
moment correlation coefficient) assigns the scores 0 and 1 to subjects who are at different levels o f a
dichotomous predictor variable and different groups representing a dichotomous dependent variable. I n
employing SPSS for logistic regression with Example 39.1, the following guidelines were employed in
dummy coding dichotomous variables: a) In the case o f the dichotomous independent/predictor variables the
following protocol was employed. A code o f 0 was assigned to the category/group of primary interest
(sometimes referred to as the baseline or reference group), and a code o f 1 was assigned to the other
category/group. I n the case of Example 39.1, the categories smokers and not having social support were
identified as the reference groups (since the researcher hypothesized that smokers and people who did not
have social support would be more likely to have a stroke than nonsmokers and people who did have social
support). Consequently, any subject who was a smoker and any subject who did not have social support was
assigned a score o f 0 on the appropriate predictor variable, and any subject who was a nonsmoker and any
subject who did have social support was assigned a score o f 1 on the appropriate predictor variable; c) In the
case o f a dichotomous dependent variable, subjects who are in the category/group of primary interest are
generally assigned a score o f 1, and subjects in the other category/group are assigned a score o f 0. (In medical
research it is common for people who are afflicted with a specific condition, such as having a stroke or who
die, to be designated as the group o f primary interest.) Since for Example 39.1 stroke was the category o f
primary interest on the dependent variable, subjects who had a stroke were assigned a score o f 1, and subjects
who did not have a stroke were assigned a score o f 0. Although intuitively it would seem that i f smokers and
people who did not have social support are predicted to be more likely to have a stroke, it would be more
logical to assign the latter categories o f people a score o f 1 on the relevant predictor variables, use o f the latter
coding with SPSS yields results which suggest the opposite o f what actually occurs. Although reversing the
codes employed for categories w i l l yield the same absolute values for some o f the relevant statistics, the signs
of the latter values may make the analysis more confusing to interpret. Readers, however, should be aware of
the fact that the coding protocol for dummy variables resulting in the most straightforward interpretation may
differ depending upon the software one employs; d) The use o f dummy variable coding can be extended to
predictor variables with more than two categories. I n the latter case, the predictor variable in question is
converted into a set of dummy variables. The number of dummy variables will always be one less than the
number o f categories on a predictor variable. To illustrate, assume that a predictor variable is comprised o f the
three categories A , B , and C, and that Category A is designated as the reference or baseline category. The
following two dummy variables can be derived: Dummy variable 1: A = 0, B = 0, C = 1 (which breaks the
predictor variable into a dichotomous variable comprised of A & B (members o f both of which are assigned a
score o f 0) versus C (members o f which are assigned a score o f 1)); Dummy variable 2: A = 0, B = 1, C = 0
(which breaks the predictor variable into a dichotomous variable comprised o f A & C (members o f both o f
which are assigned a score o f 0) versus B (members o f which are assigned a score o f 1)). Note that subjects in
any category categorized with the reference category are also assigned the score 0, while subjects in the
remaining category in the dichotomization are assigned a score of 1. A more detailed description of the latter
procedure can be found in Field (2005, pp. 208a€"210) and Hair et al. (1995, pp. 109a€"110); e) In
multinomial logistic regression (i.e., the use of a categorical dependent variable involving three or more
categories) the category o f primary interest should have the highest code number. For example, i f three
categories were employed with reference to a subject having a heart attack, the following codes might be used:
0: Did not have a heart attack; 1: Had a nonfatal heart attack; 2: Had a fatal heart attack. Since the last category
received the highest code number, it would be assumed to represent the category of primary interest; f)SPSS
refers to the predictor variables as covariates.
5. The following SPSS command sequence was employed in this chapter in conducting a logistic regression
analysis: a) Click Analyze; b) Click Regression; c) Click Binary Logistic; d) Highlight the criterion variable
(i.e., the variable indicating whether or not a person has a stroke in Example 39.1) and move it to the
Dependent window; e) Highlight the predictor variables (i.e., blood pressure, smoking, and social support in
Example 39.1) and move them to the Covariates window; f) Click Categorical i f any of the predictor
variables are categorical. Highlight the categorical variables (i.e., smoking and social support in Example
39.1) and move them to the Categorical Covariates window, and then click Continue; g) Click Options,
check off desired information (e.g., Classification plots, Hosmer-Lemeshow goodness-of-fit, etc.), and then
click Continue; h) Click Save, check o f f desired information (e.g. Predicted values, Influence, and
Residuals options), and then click Continue; i ) I n the Methods window select the type o f logistic regression
analysis to be conducted. The default option o f Enter represents standard logistic regression in which all o f
the predictor variables are entered simultaneously; j ) Click O K to obtain the output for the analysis.
6. a) The reader may want to review the concept of natural logarithms, which is discussed in Endnote 15 in the
Introduction, as well as the concept of odds, which is discussed in Section V I of the chi-square test for r
A — c tables (Test 16); b) Log odds is sometimes written as logit (p); c) Although some sources may refer to
logistic regression as logit analysis, the latter term is more commonly employed to refer to an analysis
involving multiple discrete/categorical variables, with one o f the variables designated as the dependent
variable(s) (Garson (2006), Spicer (2005, p. 204), and Tabachnick and Fidell (1996, p. 281)); c) The logit
distribution is approximately normal with a mean of 0 and a standard deviation of 1.83 (Lipsey and Wilson
(2001, p. 40).
7.ln [p(F)/(1 a€" p(Y))] can also be written as I n [p(Y) /(1 a€" p(Y))], where e represents the base value of a
e

natural logarithm which is equal to 2.71828 a€|. I f we let z represent the exponent that e must be raised to in
z
order to compute the value [p(Y) / (1 a€"p(Y))], then e = [p(Y) / (1 a€"p(Y))]. Solving the latter equation for
p(Y) yields Equation 39.3, which can be algebraically transformed into Equation 39.4.
8. Tabachnick and Fidell (1996, p. 579) note that although not a requirement of logistic regression, multivariate
normality and linearity among two or more predictor variables may increase the power o f an analysis since a
linear combination of the predictor variables is employed to form the exponent in Equations 39.3 and 39.4; b)
Garson (2006) notes that one assumption underlying logistic regression is the logit o f the predictor variable(s)
is linearly related to the dependent variable.
9. a) A n iterative procedure employs successive approximations to arrive at an optimal solution for an analysis. In
other words, a set o f operations (referred to as an iteration) is sequentially repeated until the best possible
approximation is computed for the value(s) in question. The latter will be illustrated later in Tables 39.2 and
39.3 (in Section V ) in a table labeled Iteration History. Each row of an Iteration History table displays the
result for one iteration, with the last row containing the result o f the final iteration. The value displayed in the
last row will represent the best approximation of the likelihood ratio (see Equation 39.5) computed in logistic
regression; b) Tabachnick and Fidell (1996, p. 579) note the maximum likelihood procedure for estimating
regression coefficients w i l l not be able to derive a solution for logistic regression when there is perfect
separation for a sample o f n subjects with respect to category placement a€" for example, i f all subjects in a
sample with a systolic blood pressure below a specific value do not have a stroke, while all subjects whose
systolic pressure is above that value do have a stroke. The latter authors note the latter situation, which
constitutes what is sometimes referred to as overfitting the data, is unlikely to occur except perhaps in the case
of a very small sample size. As noted in the discussion of multiple regression, more generally, overfitting is
when the information derived from a sample is tailored to the idiosyncrasies o f the sample, yet will not provide
a good fit for the target population to which a researcher wishes to generalize.
10. a) The likelihood ratio test can be employed to compare any two models with one another. Thus, i f there are p
predictor variables, the simple model can be contrasted with a model that is comprised of one or more of the p
predictors, or alternatively, two models comprised o f a different number o f predictor variables can be
contrasted with one another; b) Some sources employ Equation 39.10 as an alternative way of expressing
Equation 39.7.
11. Hosmer and Lemeshow (2000, pp. 16a€"17) also describe the Scores test (which is not available with most
computer software) as another alternative for evaluating the significance o f each o f the predictor variables.
12. As noted in Section I X (the Addendum) o f the binomial sign test for a single sample, the baserate of an
event is the proportion o f times the event occurs within a population (or, in this instance, a sample).
13.It was previously determined that by just employing the baserates in the Classification table (i.e., not taking
into account the predictor variable), the probability of having a stroke is 31/80 = .3875 and the probability of
not having a stroke is 49/80 = .6125. Using the latter values, Equation 39.1 can be employed to compute that
the odds of having a stroke are .3875/.6125 = .633, which, in fact, is the value of Exp(B) in Column 7. Note
that the natural logarithm of .633 is a€".458, which is the value recorded for the constant in Column 2 of the
Variables in the Equation table. The latter value is derived for the constant since when the logistic regression
function (as defined by Equations 39.3 and 39.4) omits a predictor variable, the value computed for the
exponent z is z = a = a€".458. Thus, the value a€".458 represents the log odds o f having a stroke (i.e., it is
equivalent to the value computed with Equation 39.2 i f all o f the predictor coefficients equal zero). Note that
since the odds o f having a stroke are less than 1, the log odds have a negative sign.
14. a) Garson (2006) and Wuensch (2005) note that since the maximum value Cox & Snell if can achieve w i l l
2

generally be less than 1, Nagelkerke R is obtained by dividing the value computed for Cox & Snell R by the
2 2

maximum value it can achieve, in order to allow for the possibility an effect size can equal 1; b) Additional
measures o f effect size for logistic regression which may be computed with software other than SPSS are
Somera€™s delta, gamma, tau-a, c, and McFaddena€™s !• . McFaddena€™s !• is the most
2 2

conservative o f the latter measures while c usually results in the highest value (Harlow (2005, p. 157). The
value McFaddena€™s !• = .437 is computed below with Equation 39.11.
2

15. A test statistic for the Hosmer and Lemeshow goodness-of-fit test cannot be computed when there is only one
predictor variable and the latter is a dichotomous variable (Field (2005, p. 254)).
16. The chi-square test for r A — c tables can also be employed to evaluate the data in the 2 A — 2 Classification
2
Table. When it is employed for the table summarizing the predicted model, it yields the significant result I J
(1) = 46.43, p < .01. The test, however, cannot be employed to evaluate the classification table for the simple
model, because the latter model yields expected frequencies o f 0 which render the test insoluble.
17. The odds ratio is also discussed in a different context in Section V I o f the chi-square test for r A — c tables.
18. a) Odds ratios and predictor coefficients for logistic regression computed by most software are in reference to
the category on the dichotomous dependent variable coded 1 (i.e., the category o f primary interest) a€" which
in the case of Example 39.1 is subjects who have a stroke (Tabachnick and Fidell (1996, p. 605)); b) Wright
(1995, pp. 223a€"224) notes the reason why the logistic regression coefficient is used to determine an odds
ratio associated with an increment o f one or more units on the predictor variable, as opposed to determining
the increase in probability associated with the latter increment, is that a change in probability w i l l not only be a
function o f the predictor coefficient but also o f the magnitude of the predictor variable. To be more specific, a
one unit increment on the predictor variable at one point on the range o f values for the latter variable may be
associated with a minimal increase in the probability o f a person being a member o f the category o f primary
interest, yet at another point on the range o f values for the predictor variable a one unit increment may be
associated with a substantial increase in the probability of a person being a member o f the category o f primary
interest. The change in odds, on the other hand, w i l l be the same for a one unit increase on the predictor
variable at any point on the scale o f values for the latter variable; c) Wright (1995, p. 223) notes that although
the value o f b (which is a measure o f the change in the natural logarithm o f the odds ratio) is in and o f itself
difficult to interpret, a positive coefficient indicates that a predicted odds ratio will increase as the value o f the
predictor variable increases, a negative coefficient indicates a predicted odds ratio w i l l decrease as the value
o f the predictor variable increases, and a coefficient o f 0 indicates the predicted odds ratio w i l l be the same for
all values o f the predictor variable; d) The steepness and direction o f a logistic regression curve (e.g., Figure
39.1) w i l l be a function of the value of the coefficient (i.e., b). The larger the absolute value of the coefficient,
the steeper the curve (i.e., the more perpendicular the central element of the curve will be in reference to the X-
axis). When the value o f the coefficient is positive, the logistic regression curve w i l l ascend upwards from left
to right as in Figure 39.1 (indicating that membership in the category of primary interest is associated with high
values on the predictor variable, and membership in the other group with low values on the predictor variable).
A negative b value, on the other hand, w i l l yield a mirror image curve which descends from left to right. A
coefficient o f zero (which yields the same probability o f being a member o f the category o f primary interest for
all values o f the predictor variable) results in a horizontal line parallel to the X-axis. The constant in the
equation determines the location o f the logistic curve in reference to the X-axis, with the curve shifting to the
left as the constant increases (assuming the value for the coefficient remains unchanged) (Wright (1995, pp.
224a€"225)).
19. Wright (1995, p. 228) notes the following with respect to a confidence interval for an odds ratio: a) Although
not the case for the example under discussion, the confidence interval will often be skewed (with the lower
limit being much closer to the sample confidence interval than the upper limit), since the smallest possible
value for an odds ratio is zero, while there is no boundary for the upper limit; b) I f the value 0 is included in a
95% confidence interval, the predictor coefficient will not be significant at the .05 level.
20. a) It is recommended that a 95% confidence interval be computed (or a 99% interval i f one wants to be more
conservative) for the probability obtained with Equations 39.3 or 39.4. In order to predict that a subject w i l l fall
in the category o f having a stroke, both the lower and upper limit o f the confidence interval should be above .5.
Since computation of the latter confidence interval involves matrix algebra, it generally requires the use of a
computer (Hosmer and Lemeshow (2000, p. 20) and Rosner (1995, p. 535)); b) In the case of two or more
predictor variables, the identical procedure is employed, except for the fact that the value o f z is obtained
through use of the equation z = a + b X + b X + a€] + b X , where bj is the coefficient for each of the k
1 1 2 2 p p

predictor variables and Xj will be the score of the subject in question on each of the predictor variables.
21. Garson (2006) notes Roaa€™s efficient score statistic is primarily employed in forward stepwise logistic
regression as a criterion for determining whether or not a variable should be included in a model.
22. The fact that the absolute value .50 for the coefficient o f systolic blood pressure is smaller than the absolute
value .059 obtained earlier when systolic blood pressure was the only predictor employed indicates that
systolic blood pressure has slightly less o f an explanatory role in predicting group membership in the model
containing three predictor variables.
23. a) As noted, even though smokers and people without social support were coded as 0 in the SPSS data editor, in
order to compute the correct probability it is necessary to reverse the coding in Equation 39.3; b) Although it
was noted earlier that social support should be eliminated from the model, it will be employed here to illustrate
how to compute a probability for a model containing all three predictor variables; c) A separate logistic
regression employing systolic blood pressure and smoking as the predictor variables obtained the values a =
a€"8.915, b = .050 (for systolic blood pressure) and b = 2.241 (for smoking). When X = 198 and X = 1, z =
1 2 1 2

a€"8.915 + .050(198) + 2.241 (1) = 3.226. When the latter value is substituted in Equation 39.3 it yields the
value p(Y) = .962. Thus, the likelihood o f the subject having a stroke is .962.
1. Readers should take note o f the fact that the use o f the term factor within the framework o f factor analysis is
not the same as its usage within the context of the between-subjects factorial analysis of variance (Test 27)
(as well as within other factorial designs), where the term factor is employed to represent each o f the
independent variables.
2. a) Principal axis factor analysis is also referred to as principal factor analysis, principal axis factoring,
common factor analysis, and principal components factor analysis; b) The term confirmatory factor
analysis (CFA) is employed when factor analysis is used within the context of hypothesis testing a€" more
specifically, for evaluating goodness-of-fit with respect to the most appropriate factorial model for describing
the dimensions underlying a set o f observed variables. The major difference between confirmatory factor
analysis (CFA) and exploratory factor analysis (EFA) (which is the factor analytic methodology discussed
in this chapter) is that in C F A the researcher a priori stipulates specific factors will load on specific measured
variables, whereas in E F A all o f the derived factors can load on any o f the measured variables (Klem (1995, p.
247). In implementing confirmatory factor analysis a researcher initially specifies a factor analytic model to
be evaluated (i.e., the researcher stipulates the specific factors which constitute a given model along with
relevant factor loadings predicted for specific measured variables), and then evaluates the data with respect to
goodness-of-fit for the hypothesized model. Additional discussion on the differences between confirmatory
and exploratory factor analysis can be found in the chapters on Path Analysis (Test 41) and Structural
Equation Modeling (Test 42) (also see Bryant and Yarnold (1995) and Pedhazur and Schmelkin (1991)); c)
Image factor extraction, maximum likelihood factor extraction, unweighted and weighted least squares
factoring, and alpha factoring are other examples o f factor analytic procedures (some or all of which are
available with different computer software packages). Since a discussion o f the latter procedures is beyond the
scope of this book, interested readers are referred to sources such as Field (2005, pp. 628a€"630) and
Tabachnick and Fidell (1996, pp. 664a€"665).
3. a) Tabachnick and Fidell (1996, p. 684) note that principal components analysis is recommended as a first step
when a researcher wants to obtain an overall picture o f the likely number as well as the nature o f the factors
which comprise a body o f data. Hair et al. (1995, p. 376) state that principal components analysis is
appropriate to employ when the primary intent o f a researcher is to optimize prediction or to identify a small
number of factors which account for most of the variability in a set of p variables. The latter authors also note
that, as a general rule, the use o f principal components analysis implies (although in reality this is not the
case) the researcher has prior knowledge indicating that only a small proportion o f the total variance can be
accounted for by specific or error variance. I f the latter is, in fact, true, Hair et al. (1995, p. 375) note the
derived components will not contain enough specific or error variance to distort the overall factorial structure
o f the data; b) Stevens (2002, p. 386) notes that principal components analysis has some similarity to
discriminant function analysis (Test 37), in that both methodologies derive uncorrelated linear combinations.
More specifically, principal components analysis derives uncorrelated linear combinations o f the original
variables which additively partition the variance for the original set o f variables. Discriminant function
analysis, on the other hand, derives uncorrelated linear combinations which are employed to additively
partition the association between the grouping variable and the set o f predictor variables.
4. a) Hair et al. (1995, p. 376) note that when the major goal o f a researcher is to identify latent dimensions
underlying a set of p variables, and that i f the researcher is ignorant with respect to the amount of specific and
error variance in the overall data, principal axis factor analysis is a more appropriate choice; b) Tabachnick
and Fidell (1996, p. 663) note that since principal axis factor analysis omits specific variance, it is less likely
than principal components analysis to mirror the relationships between the variables depicted in the original
correlation matrix.
6. Bartletta€™s test of sphericity (Bartlett (1954)) evaluates a different null hypothesis than Mauchlya€™s test
of sphericity discussed in Endnotes 15 and 7, respectively, of the single-factor within-subjects analysis of
variance (Test 24) and Hotellinga€™s T (Test 34).
2

7. a) In spite of the acceptable value obtained for the Kaisera€"Meyera€"Olkin statistic, most researchers would
consider a sample size of 80 to be too small for a factor analysis; b) On a more technical level, Tabachnick and
Fidell (1996, p. 642) note the Kaiser-Meyer-Olkin statistic (which is attributed to Kaiser (1970, 1974)) is a
ratio o f the sum o f squared correlations to the sum o f squared correlations plus the sum o f squared partial
correlations. The rationale underlying the statistic is that the partial correlations in a correlation matrix should
not be large i f the sample size is adequate; c) I n the SPSS Factor Analysis: Descriptives menu, the researcher
is offered the option o f displaying an Anti-Image Matrix (which is comprised o f an Anti-Image Correlation
Matrix and an Anti-Image Covariance Matrix) which w i l l display a Kaisera€"Meyera€"Olkin statistic for
each o f the variables. The Anti-Image Correlation Matrix provides information regarding the sampling
adequacy of each of the variables. Field (2005, p. 642) notes that the rule of thumb with respect to employing
the Anti-Image Correlation matrix is that all of the values in the main diagonal should be high (i.e., .5 or
greater), and all o f the values in the off-diagonal cells should be close to zero. The researcher should consider
deleting from the analysis any variable with a main diagonal value less than .5. I n the case of Example 40.1,
acceptable values were obtained for the Anti-image correlation matrix.
8. The SPSS Factor Analysis Extraction menu offers the researcher the choice o f analyzing the correlation
matrix or the covariance matrix (the result of the analysis w i l l often differ depending upon which matrix is
selected). Field (2005, p. 643) notes that the correlation matrix (which is the default option) is the standardized
version o f the covariance matrix, and is the matrix which should be selected under most circumstances. More
specifically, the correlation matrix should be selected when the p variables are not all measured on the same
scale (which, as in Example 40.1, is usually the case). If, on the other hand, the variables are all measured on
the same scale it may be more prudent to employ the covariance matrix for the analysis
9. The following SPSS command sequence was employed in this chapter in conducting a principal components
analysis: a) Click Analyze; b) Click Data Reduction; c) Click Factor; d) Highlight the variables to be factor
analyzed (i.e., anxiety, somatic complaints, guilt, friendliness, sensation seeking, and dominance in
Example 40.1) and move them to the Variables window; e) Click Descriptives, check off desired information,
and then click Continue; f) Click Extraction and select the type o f analysis to be conducted. Since a principal
components analysis was conducted, the default option o f Principal components was employed. (When a
principal axis factor analysis was employed for the same data, the Principal axis factoring option was
selected in this window.) After checking off any desired information available in the window, click Continue;
g) Click Rotation, check o f f the type o f rotation desired, and then click Continue. For the example under
discussion a Varimax rotation was selected; h) I f factor scores are desired for subjects, click Scores, check
Save as variables (the default option o f Regression is generally recommended), and then click Continue; i) I f
any data are missing click on Options and check the method you elect, and then click Continue; j ) Click O K
to obtain the output for the analysis.
10.If a principal axis factor analysis is conducted, only the variance which each of the variables shares with the
other variables (i.e., common variance) is available for analysis, since the latter analysis excludes the unique
variance associated with each o f the variables. I n point o f fact, a principal axis factor analysis on the same set
o f data is described in Section V I which yields two factors that are very similar to the two factors documented
in Table 40.3.
11. Although a principal components analysis was employed for Example 40.1, in the interest of accuracy it
should be noted that when SPSS is employed to conduct the latter type o f analysis, the term Components is
used in the summary table in lieu o f Factors. (The latter is illustrated in Table 40.5 as well as in Table 40.7
(which summarizes the results o f Example 40.2) in Section VIII.) When SPSS is used for a principal axis
factor analysis, the term Factors is displayed in the summary table. The term Factors is used in Table 40.3
and throughout this section, since it is more commonly employed to refer to both the components derived in
principal component analysis, as well as the factors derived with other factor analytic procedures.
12. Jolliffe (1972, 1986) employs a more liberal criterion value of .7 or larger. Although researchers are not in
agreement with respect to which criterion should be used (since use o f different criteria could result in different
conclusions), Kaisera€™s rule is most commonly employed.
13. a) A more simplistic but less accurate method for computing factor scores would be to employ the
unstandardized scores o f subjects and factor loadings for each variable in place o f the standardized scores and
weighting coefficients; b) I f requested, SPSS w i l l print out the standardized factor scores (which can range
from approximately a€"3 to +3) in the data editor (i.e., the original spreadsheet in which the data are recorded)
for each subject as new variables. Although SPSS has three options for computing factor scores, the default
method labeled Regression Method (which yields the highest correlation between the derived factor scores
and their factors a€" although the former can also correlate with factor scores on other factors) was employed
for the analysis of Example 40.1. The other methods are the Bartlett method (which derives unbiased factor
scores that only correlate with their own factors) and the Anderson-Rubin method (which derives
standardized factor scores that are uncorrelated with each other even i f the factors are correlated a€" this
method is best to employ i f one wishes to minimize multicollinearity) (Field (2005, p. 628)). Further discussion
of computing factor scores can be found in Diekhoff (1992, Ch. 16), Field (2005, pp. 625a€"628) and
Tabachnick and Fidell (1996, pp. 678a€"679); c) The SPSS output includes a table labeled Component Score
Coefficient Matrix (which is displayed in Table 40.5) that contains the coefficients which are employed to
compute factor scores for each subject (which as noted above can be displayed in the data editor at the end o f
the analysis). The component scores in the latter matrix express the principal components in terms o f the
variables (i.e., each principal component can be expressed as a linear combination o f the original variables,
which is what Equation 40.1 represents), while the factor loadings in Table 40.3 express the variables in terms
o f the principal components.
14. a) Alternatively, one could elect to have Factor I represented on the ordinate and Factor II on the abscissa; b)
I f three factors are derived in a factor analytical procedure, three-dimensional space is necessary in order to
geometrically represent the result. I f there are more than three factors, the appropriate multi-dimensional space
is required (yet cannot be depicted visually, since geometrical space beyond three dimensions cannot be
represented graphically).
15. Rotation can only be employed when two or more factors are derived. I f the rotation option is employed in
SPSS when only one factor has been derived, the same result w i l l be obtained as that for the unrotated factor.
16. a) In oblique rotation a researcher can set a value for a variable (referred to as delta in SPSS) which establishes
the maximum degree to which factors may be correlated with one another. The default value for delta is zero,
and as its value is decreased the lower the correlation between factors. Higher values increase the degree o f
correlation between the factors (for further details see Field (2005, p. 637) and Tabachnick and Fidell (1996, p.
668)). Tabachnick and Fidell (1996, p. 668), note, however, that use o f an oblique rotation does not guarantee
the derived factors w i l l be correlated with one another; b) When an oblique rotation is employed, the SPSS
output will include the following three tables: 1) A table labeled Pattern Matrix contains factor loadings for
the obliquely rotated factors; 2) A table labeled Structure Matrix contains the correlations between the factors
and the variables. This w i l l not be the same as the Pattern Matrix, since the factors may be correlated with
one another; 3) A table labeled Component Correlation Matrix contains the correlations between the rotated
factors. (When an orthogonal rotation is employed, the SPSS output will include a table labeled Components
Transformation Matrix which shows the correlation of the factors before and after rotation.); c) Oblique
factors can themselves be treated as variables, and factor analyzed into lower order factors. A possible situation
in which oblique rotation might be employed is within the context o f confirmatory factor analysis when a
specific theoretical model a researcher is trying to confirm factorially is comprised o f factors that are
conceptualized as being correlated with one another.
17.With the exception o f Test D: Emotional Maturity, the other tests are elements o f commonly administered
intelligence tests. Three o f the tests (Test A: Vocabulary (which involves defining words); Test B:
Information (which tests age appropriate knowledge); and Test C: Verbal Comprehension (which involves
explaining the meaning o f sentences such as proverbs)) are often categorized as components o f what is referred
to as verbal intelligence, while the remaining three tests (Test E : Object Assembly (which involves
assembling puzzles); Test F: Block Design (which involves arranging blocks imprinted with designs to match
designs printed on cards); and Test G: Mazes (which involves using a pencil to traverse a maze)) are
categorized as components of nonverbal or performance intelligence.
1. Example 41.1 employs the same data values employed in an example presented in Klem (1995, pp. 82a€"87)
which is based on a study by Romney et al. (1992). The variables employed in Example 41.1 are, however,
different from those employed in the original study.
2. Although, as noted, specialized computer software is commonly employed for path analysis, path coefficients
(which are represented by the standardized regression coefficients) can be computed with any linear regression
program that yields standardized coefficients, with the latter being equal to the path coefficients. Such
programs also compute multiple correlation coefficients.
3. Values for the disturbance terms (i.e., variance of the residuals (which are presented in Figure 41.3) or the
residual path coefficients) may be omitted from a path diagram in order to make the diagram easier to follow.
4. a) Klem (1995, p. 75) notes that the term effect coefficient is employed in some sources for the total causal
effect; b) A detailed discussion o f decomposition o f correlations can be found in Chapter 18 o f Pedhazur
(1997).
6. a) The author is indebted to Dr. Laura Klem for her input on the analysis o f effect values; b) A more detailed
summary o f Sewell Wrighta€™s multiplication and tracing rules follows: a) To find the correlation between
X and X where X appears further to the right in the model, begin at X and read backward toward X along
i j j j i

each distinct direct and indirect (compound) path, and compute the product o f the coefficients along that path.
This will provide the correlation between Xj and X that is due to the direct and indirect effects of X on X ; b)
i i

After reading backward, i f necessary read forward, but only one reversal from back to forward is permitted.
This w i l l provide the correlation that is due to common causes (i.e. spurious effects); c) A double-headed arrow
may be read either forward or backward, but you can only pass through one double-headed arrow on each
transit. This will provide the correlation that is due to correlated causes (which can be either spurious or
unanalyzed effects); d) I f you pass through a variable, you may not return to it on that transit; e) The sum o f the
products obtained for all the linkages between X and X represents the total correlation between the two
i j

variables. The key in employing Wrighta€™s rules is to make sure no linkages are missed, counted twice, or
there are any illegal double reversals.
7. The term residual matrix is employed to describe a matrix that contains the absolute values of the differences
between the implied correlations and the observed correlations.
8. Along with the values of the path coefficients, most computer software employed for path analysis prints out
the standard error for each coefficient and a t value that is obtained by dividing the value o f a path coefficient
by its standard error. Any path coefficient that yields a t value (which is sometimes referred to as the critical
ratio) of 1.96 or greater is significant at the .05 level a€" the latter allowing a researcher to conclude that the
value o f the path coefficient in the underlying population is some value other than zero. The t test involved in
evaluating a path coefficient is identical to that employed to evaluate the same hypothesis in reference to a
regression coefficient (which is described in Section V o f the chapter on multiple regression and by Equation
28.27 in the case of bivariate regression).
9. a) One measure of goodness-of-fit of a path model that will not be discussed in this section is the Q statistic
developed by Specht (1975). The latter statistic yields a value between 0 and 1, with the closer the value of Q
to 1 the better the fit o f a model. Q is generally not computed by statistical software commonly employed for
path analysis, suggesting that many researchers consider it an outdated method for assessing fit; b) Additional
discussion o f the indices discussed in this section can be found in Section V in Test 42 on structural equation
modeling.
11.Pedhazur (1997, p. 820) notes that although it is theoretically possible for both a G F I and A G F I to assume a
negative value, the latter should not occur since according to JAfreskog and SAfrbom (1993, p. 123) a
negative value means a model fits worse than no model at all.
12.a) The concept o f standardized residuals is discussed in Section V I o f the chi-square goodness-of-fit test and
the chi-square test for r A — c tables (Test 16); b) A less rigorous criterion for evaluating a standardized
residual would be a value of 1.96 or greater.
1. Kline (2005, p. 14) and Thompson (2000, p. 263) note that all o f the parametric statistical methodologies can be
conceptualized within the general linear model (GLM) (which is discussed in Section V I I of the chapter on
the single-factor between-subjects analysis of variance (Test 21) as well as in Endnote 2 in the chapter on
canonical correlation (Test 38)). They further note that just as all univariate parametric procedures (e.g., t
tests, analysis o f variance, etc.) can be conceptualized within the G L M as special cases o f multiple
regression, and that the latter along with all other multivariate procedures can be subsumed as special cases o f
canonical correlation, all multivariate procedures including canonical correlation can be viewed as special
cases o f SEM. Raykov and Marcoulides (2006) note that S E M can be differentiated from other methods of
classical linear modeling (which can utilize methodologies such as regression analysis, analyses of variance
and covariance, and the multivariate procedures discussed in previous chapters). Although S E M is also
based on linear modeling, it allows a researcher to obtain error measurement on all variables a€" most
specifically, for predictor/independent variables. I n contrast, classical linear modeling does not assume error
measurement on independent variables, and its failure in this regard can result in the derivation o f incorrect
models. Although the other methodologies associated with the G L M can also be used for conceptualizing
models, S E M represents a more flexible approach to model building.
2. Kline (2005, p. 13) notes that S E M is actually a more flexible analytic technique than will be described in this
chapter. Specifically, he discusses how S E M can, in fact, be applied with experimental as well as
nonexperimental data, and how it can be employed to test for such things as differences between group means.
3. Within the framework o f S E M it is also possible for a model to include one or more aCoestandaloneaC'
measured variables a€" i.e., measured variables that are not conceptualized as measures of any of the latent
variables specified in the model. When such measured variables are present in a model, the analysis w i l l
evaluate the hypothesized relationships between the latent variables as well as those that involve the
relationship o f such stand alone measured variables to one another and/or the latent variables.
4.Inferential statistical tests are employed to determine whether or not a C F A adequately fits the data. Among
others, Kline (2005, Ch. 7) discusses such tests.
5.The reader should take note of the fact that in Test 41 on path analysis, Model (e) in Figure 41.2 depicts a
nonrecursive model, and within the framework o f the latter the feedback loops between Variables A , B, and C
with Variable D are represented with straight lines with arrows at both ends connecting Variables A , B, and C
with Variable D. As noted in the latter chapter, feedback loops indicate it is hypothesized that simultaneously
the two variables have a direct effect on each othera€"i.e., the variables in question are both the cause and
effect o f one another). When a feedback loop exists between two variables in a nonrecursive path model in
S E M , the latter is indicated through the use of two lines connecting the two variables a€" specifically, by
having each of the variables having a straight line emanating from it with an arrow at its end directed toward
the other variable (i.e., Variable A a j „ Variable B).
8. a) Example 42.1 displays data values employed in an example presented in Ho (2006, pp. 299a€"303) (with
some modifications made in the data by the author). The variables employed in Example 42.1 are different
from those employed by Ho, and, as such, any conclusions drawn from the example should be viewed as
fictitious; b) The concept o f the authoritarian personality was introduced in 1950 in a book written by Theodor
Adorno and his associates at the University o f California. The authoritarian personality is characterized by,
among other things, an unquestioning obedience to authority figures and behavior involving hostility and
scapegoating o f individuals who are nontraditional in their ideology or members o f minority groups. It is most
commonly measured with the California F-scale (F standing for fascism), which is a personality test developed
by Adorno and his associates.
9. A description o f the underlying mathematical procedures involved in S E M is beyond the scope of this chapter.
Readers interested in the latter should consult sources such as Pehazur (1997, Ch. 19) and Ullman (2001).
10. a) A covariance between two variables is obtained by dividing the sum of products by (n a€" 1). A full
description o f the covariance, which can be computed with Equation 28.34 or 28.35, can be found in Section
V I I of the Pearson product-moment correlation coefficient (Test 28); b) Since covXY=rXYsEoeXsEoeY,
then rXY=covXYsEoeXsEoeY. A variance-covariance matrix is a table containing the variances of all the
measured variables and the covariances between all pairs of measured variables); b) Hair et al (1995, p. 636)
note that the use o f covariances (as opposed to correlations) allows for valid comparisons between different
populations or samples, with the latter not being possible i f the correlation matrix represents the input data. The
correlation matrix represents a standardized variance-covariance matrix, since the scale o f measurement
employed for each o f the variables has been removed by virtue o f dividing the variances or covariances by the
product of the standard deviations. Although use of correlation coefficients optimizes onea€™s ability to
understand the pattern o f relationships among the latent variables/factors, it does not provide enough
information to explain the total variance for a latent variable. Use o f correlations is also appropriate for making
comparisons across different variables, since unlike covariances, which are affected by scale of measurement,
correlation coefficients are not. In summary, the variance/covariance matrix should be used to conduct a true
test o f the theory underlying the hypothesized model, yet i f one is only interested in the pattern o f relationships
among the variables, without requiring a total explanation o f variability, the correlation matrix is acceptable.
Results based on the correlation matrix should be interpreted with caution and should be viewed cautiously
with regard to generalizability to different situations. With regard to tests of significance, a correlation matrix
yields more conservative estimates o f significance than a variance-covariance matrix.
11 .Raykov and Marcoulides (2006, p. 42) note that most of the alternative goodness-of-fit indices are functions of
the chi-square value computed for the chi-square goodness-of-fit index.
12. According to Raykov and Marcoulides (2006, p. 42), the logic behind this is that since inferential testing is
structured such that one can never confirm a hypothesis/model but only disconfirm (i.e., reject) it, it is
desirable to optimize the testing procedure such that it is most likely to reject a model which, in fact, is false.
13. a) Clarification of the noncentral chi-square distribution can be found in Section V I of the chi-square test
for r A — c tables (Test 16) under the discussion of the contingency coefficient (Test 16f); b) A n explanation
of the theoretical basis underlying the R M S E A can be found in Raykov and Marcoulides (2006, pp. 45a€"47).
14. Ho (2006, pp. 286a€"287) notes that the underlying rationale for model comparison indices is that i f a
hypothesized model yields a model comparison index value above .90 it represents more than a 90%
improvement over the null model (and consequently the only possible improvement to the model is less than
10%). However, the latter interpretation must be qualified since the researcher also has to take into
consideration other substantive and theoretical considerations relevant to the analysis o f the model in question.
15. Computer programs for S E M typically print out standardized residuals for each pair o f variables contained
within a model. Standardized residuals with an absolute value larger than 2.58 (some sources employ 1.96) are
considered large, and would be indicative of an element within a model that has a poor fit; b) A Heywood case
is the term employed for an illogical parameter value (such as a negative variance or an estimated correlation
with an absolute value greater than 1). Heywood cases (which are also observed in factor analysis) reflect an
attempt by an analysis to force the model on a set o f data. Heywood cases are most often caused by model
misspecification or nonidentification, presence o f outliers in the data, or a small sample coupled with two or
less poorly measured variables for the latent variables.
16. This test is described in Section V I (under heterogeneity chi-square analysis) o f both the chi-square
goodness-of-fit test and the chi-square test for r A — c tables.
17.Some sources note that the chi-square difference test may be overly sensitive to small differences. Because o f
the latter it may result in a statistically significant difference between the two models when, in fact, the actual
difference is trivial in magnitude or o f little or no theoretical and/or practical consequence.
18. Ullman (1996, p. 753) notes that since the Lagrange multiplier asks what parameters can be added to a model
in order to improve it, the procedure is analogous to forward stepwise regression (discussed in Section I in
the chapter on multiple regresssion), while the Wald test, which asks what parameters can be deleted from a
model, is analogous to backward stepwise regression (Ullman (1996, p. 758).
19. a) Ho (2006, p. 289) and Pedhazur (1997, pp. 879a€"880) note that one questionable form of model
modification is the addition of correlated errors to a model to improve fit a€" specifically, correlated errors
between error terms o f latent variables or between error terms o f measured variables. Although the addition o f
correlated errors to a model can improve fit by accounting for unwanted covariation, often it may do so at the
cost o f meaning and substantive conclusions that can be drawn from the model. Without strong theoretical
justification for modifying a model through use of correlated error terms, a researcher may just capitalize on
chance (i.e., because of idiosyncrasies in the observed data) and obtain an improved model which, in fact, does
not accurately reflect the actual relationship between the modeled variables in the underlying population; b)
Schumacker and Lomax (1996, pp. 109a€"114) provide an example where through use of a correlated error
term the fit o f an original model is substantially improved. The modified model employs an additional free
parameter represented by a correlated error term for two o f the measured variables. The selection o f the latter
correlated error term as an additional free parameter to be estimated in the model was based on a modification
index that is calculated for all nonestimated relationships in the model. The value o f the modification index for
each nonestimated relationship indicates the approximate decrease in the value o f the chi-square goodness-of-
fit index that will result i f the parameter in question is added to the model.
20. Because the same value for a covariance between any pair o f variables will appear both above and below the
diagonal, only one o f the values for the covariance between that pair o f variables is usually displayed in the
matrix and subsequently employed in the analysis.
21. Note that the t values (and standard errors) computed in Table 42.2 are based on unstandardized regression
weights, and that t values are only computed for 4 o f the 6 measured variables. The latter is the case since t
values are only computed for measured variables whose value has not been fixed to equal 1. Consequently, t
values are not computed for MVl and MV4. The latter two variables (which previously, through use o f C F A ,
would have been found to load significantly on their latent variables) would also yield significant t values i f in
order to establish the scale o f measurement for the relevant latent variable an alternative measured variable had
been fixed to equal 1.
2
22. Given that the notation R (or r in the case of two variables) represents the squared value of a standardized
regression weight (which is equivalent in meaning to a path coefficient or a factor loading), the reader should
2
take note of the following: I n this chapter (as well as in Test 41 on path analysis) the equation the 1 a€" R
(which as noted in Section I V in Test 41 yields what is referred to as a variance of the residuals) is employed
to compute the error term for a variable, and consequently (as is the case in Figure 42.4) for any
endogenous/dependent measured variable, the square of its factor loading plus the value computed for its
error term w i l l sum to 1. Other sources (e.g., Ullman (2001)), however, use the equation 1a"'R2 (which as
noted in Section I V in the Test 41 yields what is referred to as a residual path coefficient) in computing the
error term for a variable. I f the latter equation is used, for any endogenous/dependent measured variable, the
square o f its factor loading plus the square o f the error term w i l l sum to 1.
23. Kline (2005, p. 6) notes that although there is no standardized statistical notation for S E M , the notation
employed in LISREL is most commonly employed in books and journals.
24. The data employed for Example 42.2 are not based on an actual study, and, as such, any conclusions drawn
from the example should be viewed as fictitious.
1. Various sources (e.g., Kline (2004, p. 251), Hunter and Schmidt (2004, p. 21), and Schulze (2004, p. 9)) note
that a meta-analysis can evaluate summary statistics which are the result o f primary or secondary analyses o f
research. A primary analysis is the analysis o f data by the original researcher who conducts a study, whereas a
secondary analysis involves reanalysis o f data reported in the original study through use o f a different
methodology and/or additional analysis of the published data by a second party.
2. For example, Hunter and Schmidt (1990; 2004, p. 62) note that if, in fact, a population correlation is some value
other than zero (i.e., I D a%o 0) and the statistical power of each of the m correlation coefficients computed is
less the .50, then the greater the number of studies conducted the greater the likelihood the vote-counting
method will lead a researcher attempting to synthesize the results of all the studies to conclude that !• = 0.
3. Cohen (1977, 1988) also discusses additional effect size indices that are employed for computing the power of
various multivariate procedures.
4. Lipsey and Wilson (2001, p. 147 note that Cohen (1977, 1988) did not base the values he stipulated for small,
medium, and large effect sizes on a systematic review of the literature in psychology or any other academic
discipline. Given the large amount o f empirical results available at this point in time Lipsey and Wilson (2001,
p. 147) suggest it might be prudent to reconsider Cohena€™s values in reference to actual empirical
distributions o f effect size values obtained in specific areas o f research, and to redefine the boundary values
based on the latter type o f empirical information (assuming it is available). As an example, Lipsey and Wilson
(1993) evaluated effect size values computed for over 300 meta-analyses involving psychological, behavioral,
and educational intervention research. They divided the effect size values into quartiles with the bottom
n d rd
quartile yielding an effect size value of .30, the median (i.e., 2 quartile) an effect size of .50, and the 3
quartile an effect size of .67. I f one employed the latter values with respect to the research area in question, the
values .30, .50, and .67 could be employed in designating the lower bounds for small, medium, and large
effects sizes.
5. a) The values that Cohen (1977; 1988, pp. 24a€"27) employs for identifying a small versus medium versus large
effect size for the d index and other indices to be described in this section were developed in reference to
behavioral science research. Although these values can be employed for research in areas other than the
behavioral sciences, it is conceivable that practitioners in other disciplines may elect to employ different values
which they deem more appropriate for their area of specialization; b) Equation 43.2 can also be employed to
convert an omega squared value (I96oEoe2) (discussed in Section V I of the t test for two independent
samples) into a d value. I f the values .0099, .0588, and .1379 (which are Cohena€™s lower limits for omega
2
squared for a small, medium, and large effect size) are employed to represent r in Equation 43.2, they yield
the following corresponding d values: .2, .5, and .8; c) Alternative equations for converting a d value into an r
value (yielding a slightly different value than that obtained with Equation 43.3) were suggested by Aaron et al.
(1998) (also found in Schulze (2004, p. 31); d) Further clarification o f the relationship between r and
Cohena€™s d index can be found in Cohen (1977; 1988, pp. 81a€"83); e) Schulze (2004, p. 29) notes that
Hedges and Olkin (1985, p. 86) first noted that the estimated population variance for a d value is computed
with Equation 43.64.
6. Although Cohena€™s d index is also used as a measure of effect size within the framework of meta-analysis,
Rosenthal (1991, pp. 17a€"18) argues that the use of an r value is preferable to the use of a d value as a
measure o f effect size.
7. The author is indebted to Robert Rosenthal for clarifying some o f the issues discussed in this section.
8.It is important to emphasize that researchers are often not in agreement with regard to the most appropriate
estimate o f effect size to employ. Hopefully, i f an effect o f some magnitude is present which has theoretical or
practical implications, regardless o f which measure o f effect size one employs, a reasonably accurate estimate
o f the effect size w i l l emerge from an analysis.
9. a) The same test result will be obtained i f the z value for Study E is assigned a positive sign, and the z values for
Studies A, B, C, and D are assigned negative signs; b) I f in a given study the means of the two groups are equal,
thep value for that study w i l l equal .50, and the corresponding z value will equal 0.
10. Rosenthal (1991) presents a modified form of Equation 43.15 that allows a researcher to differentially weight
the k studies employed in a meta-analysis. A weighting system is employed within this context to reflect the
relative quality of each of the studies. The magnitude of the weights (which is assigned by a panel of judges) is
supposed to be a direct function of the quality of a study. Lipsey and Wilson (2001, p.127), however, do not
recommend the latter type o f weighting.
11 .It is interesting to note that the average z value for five studies is zA k=.95, and that the latter z value in itself is
not statistically significant. It is quite common for Equation 43.15 to yield a significant combinedp value when
the average z value is not statistically significant.
12. a) Later in Section I V / V a Q-statistic (based on effect sizes expressed in standard deviation units as opposed to
correlation coefficients (computed with Equations 43.59 and 43.60)) developed by Cochran (1954) (described
in Hedges and Olkin (1985, p. 123), Huedo-Medina et al. (2006) and Lipsey and Wilson (2001, pp.
115a€"116)) is presented which is commonly employed as an alternative to another equation presented later in
this section (specifically, Equation 43.18) to evaluate the null hypothesis that the population effect sizes for a
set o f k studies are homogeneous. Computation o f the Q statistic to be described later with Equation 43.59 is
based on summing the squared deviation o f the effect size estimate for each study from the weighted mean
estimate o f the overall effect size, and in doing so, weighting the contribution o f each study by the inverse o f
its variance. The larger the value of Q (which is distributed as chi-square with (k - 1) degrees of freedom) the
more likely the null hypothesis w i l l be rejected. The limitations o f the latter Q statistic (e.g., its low power in
detecting heterogeneity o f effect size when a small number o f studies is employed in a meta-analysis), which
are also associated with Equation 43.18, as well as alternative statistics for assessing homogeneity of effect
size, are discussed in sources such as Hedges and Olkin (1985) and Huedo-Medina et al. (2006); b) In Section
V I of the chi-square test for r A — c tables, Test 161: The Mantel-Haenszel analysis (Test 161-a: Test of
homogeneity of odds ratios for Mantel-Haenszel analysis, Test 161-b: Summary odds ratio for Mantel-
Haenszel analysis, and Test 161-c: Mantel-Haenszel test of association) is described as an alternative (and
what most sources consider to be a more reliable) procedure for evaluating and pooling the results obtained for
multiple 2 A — 2 contingency tables which evaluate the same hypothesis.
13. a) One strategy for dealing with sampling bias discussed by Lipsey and Wilson (2001, p. 165) is to compare the
mean effect size computed for published verus unpublished studies (assuming the latter can be obtained). The
degree o f difference between the mean values can be employed to assess the impact o f sampling bias due to the
omission o f unpublished studies; b) In discussing how meta-analysts deal with studies that have insufficient
information to compute an effect size value Lipsey and Wilson (2001, p. 70) note that probably the most
common approach is to omit such studies from a meta-analysis (and thus relegate them to the file drawer). The
latter authors note an alternative strategy that would allow a result from a study with insufficient information to
be included in a meta-analysis involves imputing (i.e., estimating) a value for a missing effect size. For
example, i f a study just stated a result failed to achieve statistical significance, the latter study could be
assigned an effect size o f zero (a strategy that admittedly would result in a downward bias in estimating the
overall mean effect size). Alternatively, i f a study just stated a result was statistically significant but did not list
the exact probability, the latter could be set equal to .05 and i f the sample size were known a minimum effect
size value associated with such a result could be computed; c)Another approach for dealing with publication
bias was put forth by Kraemer et al. (1998) who suggested that a meta-analysis should only include high power
(i.e., large sample size) studies. Lipsey and Wilson (2001, p. 71), however, note that the latter strategy defeats
one of the advantages associated with meta-analysis a€" notably, its ability to attain high statistical power from
a set of studies, many of which may have low power.)
14. The same test result w i l l be obtained if, instead, we assign a negative sign to the Fisher transformed z values
r

for Studies A, B, C, and D, and a positive sign for the Fisher transformed zr value for Study E.
15. The logit distribution is approximately normal with a mean of 0 and a standard deviation of 1.83 (Lipsey and
Wilson (2001, p. 40)).
16. Lipsey and Wilson (2001, p. 49) note that in some situations the variability associated with a treatment may be
affected by the treatment. I n such cases it is recommended that in lieu of the pooled standard deviation value,
the standard deviation computed for the control group be used in estimating the effect size.
17. Lipsey and Wilson (2001, p. 166) note that Equation 43.65 (which is the analog of Equation 43.16), developed
by Orwin (1983) employs the standardized mean difference effect size in determining the number of studies
with an effect size o f zero that would be required to reduce the mean effect size to a specified level.
18. a) Lipsey and Wilson (2001, pp. 161a€"162) note that the meta-analyst should not give too much attention to a
mean effect size value without considering the variance. When the evidence indicates homogeneity o f effect
sizes across studies, the mean may serve as a good summary statistic for effect size, yet when heterogeneity is
present the mean w i l l not be very useful. Under the latter circumstances the meta-analyst should be more
concerned with explaining heterogeneity a€" specifically, identifying variables or methodological issues
associated with different studies that could be responsible for variability with respect to effect size; b) Equation
43.59/43.60 may fail to reject the null hypothesis when, in fact, there is considerable variability among effect
sizes. As noted in Endnote 12, the latter is most likely to occur when the power o f the test is compromised
when the value of k is small (i.e., when only a small number of effect sizes are evaluated in the meta-analysis);
c) The analysis described in this section assumes a fixed-effects model. Lipsey and Wilson (2001, pp.
116a€"126) note that in the latter model random error is assumed to only stem from chance factors associated
with subject-level sampling error in a given study. I f one has reason to doubt the latter assumption and believe
that other factors may be responsible for error variability (e.g., differences between studies that are attributable
to procedural or environmental variation) the results obtained from use of Equation 43.59/60 may be
challenged. Although from a statistical perspective it can be argued that a significant Q value suggests
sampling error is due to factors other than subject-level sampling, while a nonsignificant Q value supports the
assumption o f a fixed-effects model, the latter is not necessarily the case. Lipsey and Wilson (2001, pp.
116a€"126) discuss options available to the researcher who rejects the assumption of a fixed effects model for
either conceptual or statistical reasons. Within the framework o f the latter discussion they discuss: 1) A
random-effects model, which assumes that in addition to subject-level sampling there are other sources o f
variability that are randomly distributed; and 2) A mixed-effects model, which assumes subject-level sampling
error, between-study differences, and an additional random component. The differences between a fixed- and
random-effects model are discussed in greater detail later in this section.
19. The values for v and w in Tables 43.3 and 43.4 will be employed later in this section for the analysis o f a
2

random-effects model.
20. Note that the definition of a fixed-versus random-effects model within the framework of a meta-analysis is
different from the definition of the two models in reference to the analysis of variance (the latter definition is
presented in Section V I I of the single-factor between-subjects analysis of variance).
2
21. a) Borenstein et al. (2009, p. 72 and p. 91) represent v , the between-studies variance, with the notation T ,
bs
2 2
and note that the latter value (keep in mind T = v ) can be computed with the equation T = (Q - df/C. In the
bs
2
latter equation C=I£wia"'[ I£wi2/I£wi ] . The equation T = (Q - df)/C, in fact, is equivalent to Equation 43.61.
Borenstein et al. (2009, p. 72) note that this method for computing between-studies variability is sometimes
referred to as the method of moments or the DerSimonian and Laird method; b) I f we employ the notation
2
T to represent between-studies variability, within the framework o f the random-effects model the weight
assigned to each study is w=1/(vwitha"'studies+T2); c) The variance of the true effect size (i.e., between-
2
studies variability) in the underlying population of studies can be represented by the notation tau squared (I,, )
2 2
(the best estimate of which is provided by T = v ). While the actual value of I„ can never be less than zero,
bs
2
the value computed for T can be less than zero i f the observed variance (i.e., Q) is less than would be expected
2
on the basis o f within-studies variability. When a negative value is obtained for (Q - df), the value o f T is set
2
equal to zero (Borenstein et al. (2009, p. 114)). The value T, which is the square root o f T , represents the
standard deviation of the true effect sizes (i.e., the standard deviation representing between-studies variability);
2 2
d) Borenstein et al. (2009, pp. 117a€"119) note that Higgins et al. (2003) proposed the statistic I , with I = [(0
2
- df)/Q]100%. The value o f I represents the proportion of true heterogeneity to total variance across the
2
observed effect estimates. The I statistic allows for description o f variability on a relative rather than absolute
2
scale. Higgins et al. (2003) suggested that I values o f 25%, 50%, and 75% can be employed to represent low,
2
moderate, and high levels o f between-studies heterogeneity. The higher the value o f I the larger the between-
studies variability, and consequently the more obliged a meta-analyst will be to account for such between-
studies effect size variability; e) Borenstein et al. (2009, Chapters 16 & 18) describe how to compute
2 2
confidence intervals for I , and I .
22. The reader should take note o f the following with respect to Q values computed throughout this section: a) I t
was noted that Q represents the observed variability in the data and i f (as is the case in the fixed-effects model)
it is assumed that all studies share a common effect size (i.e., any variability in the data is due to within-studies
variability), the value o f df w i l l represent the expected variability in the data. Consequently, the value resulting
from (Q - df) represents any excess variability a€" i.e., between-studies variability; b) Borenstein et al. (2009,
p. 113) note that although a significant Q value indicates evidence for heterogeneity, a nonsignificant Q value
should not be taken as evidence for homogeneity, since a nonsignificant result can easily be due to low
statistical power. The latter authors note that when there is substantial between-studies variability in a meta-
analysis, in spite of the latter a nonsignificant Q value may be obtained i f there is a small number of studies
involved in the meta-analysis and/or large within-studies variability. Furthermore, Borenstein et al. (2009, p.
113) note that the value computed for Q only addresses whether or not the test of homogeneity is statistically
significant, and is not an indicator o f the amount o f variability with respect to the true effect sizes in the
underlying population. Borenstein et al. (2009, p. 121) emphasize that Q in and o f itself provides limited
information, since the latter statistic only addresses the viability o f the homogeneity hypothesis and not the
2
amount o f excess variability. Additionally, Q is sensitive to relative variance (which is measured by the I
2
statistic discussed in Endnote 21) and not absolute variance (which is measured by vbs = T ).
23. a) In the Introduction it was noted that Hunter and Schmidt (1990, 2004) are among the methodologists who
argue that the classical hypothesis testing model should no longer be employed (i.e., that tests of statistical
significance should no longer be used for hypothesis testing), and that decision making in individual studies
should be based in large part or entirely on the values computed for confidence intervals; b) Predictive
validity refers to the degree to which performance on a test or some other predictor variable is able to predict
the performance o f subjects on some measure of behavior. Among other things, Hunter and Schmidt (2004)
contend that by taking into account the impact o f sampling error, psychometric meta-analysis corrects for
attenuation (i.e., underestimation) in correlations which are employed for predictive purposes; c) Schulzea€™s
(2004) book was published prior to publication of Hunter and Schmidta€™s (2004) most recent book in which
they describe the use o f psychometric meta-analysis for situations other than evaluating predictive validity for
personnel selection.
24. To illustrate a simple example of Hunter and Schmidta€™s methodology (2004; p. 8; pp. 59a€"64; p. 81; pp.
88a€"92) a meta-analysis which corrects for just sampling error (but not any other artifacts) w i l l be
demonstrated in reference to Example 43.1. Hunter and Schmidt (2004, p. 81) use the term bare-bones meta-
analysis for an analysis that only corrects for sampling error. Hunter and Schmidt (2004, p. 81) employ
Equation 43.64 to compute the best estimate of a population correlation (which will be designated with the
notation rA ) based on k studies. (As noted in Endnote 17 of the Pearson productmoment correlation
coefficient, Hunter and Schmidt (1987; 2004, pp. 82a€"83) and Hunter, Schmidt, and Coggin (1996) contend
that in contrast to Equation 28.25 (presented in Section V I o f the Pearson product-moment correlation
coefficient), which they state is positively biased, Equation 43.66 provides the optimal weighted estimate o f a
population correlation. Yet Schulze (2004, p. 65) states that Equation 43.66 is negatively biased a€" i.e., it
underestimates the population correlation.)
26. As noted under the discussion of the odds ratio in Section V I of the chi-square test for r A — c tables, the
values of the odds ratio and the relative risk w i l l be very close together when the event in question (in this
case a heart attack) has a low probability o f occurring. The likelihood o f someone in the placebo group having
a heart attack is 189/11,034 = .01713, while the likelihood of someone in the aspirin group having a heart
attack is 104/11,037 = .00942 (note that .01713/.00942 = 1.82, which is the value of relative risk). Thus, the
values computed for the odds ratio and the relative risk are almost identical.
26.It is of interest to note, however, that i f the chi-square test for r A — c tables is employed to evaluate the data
2
in Table 43.13, Equation 16.2 yields the value I J = 18. Employing Table A4, we determine that the computed
2
value I J = 18 is greater than the tabled critical values (for df = 1) IJ.052=3.84 and IJ.012=6.63. Thus, the null
2
hypothesis can be rejected at both the .05 and .01 levels. Substituting I J = 18 in Equation 16.21 yields the
value I'=IJ2/n=18/200=.30, which corresponds to the value o f the correlation noted for the problem under
discussion.
27. The noncentral F distribution was alluded to previously in Section V I of the single-factor between-subjects
analysis of variance under the discussion o f power. Further clarification o f the noncentrality parameter
underlying the noncentral F distribution can be found in Endnote 9 o f the single-sample t test.

You might also like