You are on page 1of 6

34 C ONTRIBUTED R ESEARCH A RTICLES

Nonparametric Goodness-of-Fit Tests for


Discrete Null Distributions
by Taylor B. Arnold and John W. Emerson Kolmogorov-Smirnov test
Abstract Methodology extending nonparamet- Overview
ric goodness-of-fit tests to discrete null distribu- The most popular nonparametric goodness-of-fit test
tions has existed for several decades. However, is the Kolmogorov-Smirnov test. Given the cumula-
modern statistical software has generally failed tive distribution function F0 ( x ) of the hypothesized
to provide this methodology to users. We offer distribution and the empirical distribution function
a revision of Rs ks.test() function and a new Fdata ( x ) of the observed data, the test statistic is given
cvm.test() function that fill this need in the R by
language for two of the most popular nonpara-
metric goodness-of-fit tests. This paper describes D = sup | F0 ( x ) Fdata ( x )| (1)
x
these contributions and provides examples of
their usage. Particular attention is given to vari- When F0 is continuous, the distribution of D does not
ous numerical issues that arise in their implemen- depend on the hypothesized distribution, making this
tation. a computationally attractive method. Slakter (1965)
offers a standard presentation of the test and its perfor-
mance relative to other algorithms. The test statistic is
easily adapted for one-sided tests. For these, the abso-
lute value in (1) is discarded and the tests are based on
either the supremum of the remaining difference (the
Introduction greater testing alternative) or by replacing the supre-
mum with a negative infimum (the lesser hypothesis
Goodness-of-fit tests are used to assess whether data alternative). Tabulated p-values have been available
are consistent with a hypothesized null distribution. for these tests since 1933 (Kolmogorov, 1933).
The 2 test is the best-known parametric goodness- The extension of the Kolmogorov-Smirnov test
of-fit test, while the most popular nonparametric to non-continuous null distributions is not straight-
tests are the classic test proposed by Kolmogorov forward. The formula of the test statistic D remains
and Smirnov followed closely by several variants on unchanged, but its distribution is much more diffi-
Cramr-von Mises tests. cult to obtain; unlike the continuous case, it depends
on the null model. Use of the tables associated with
In their most basic forms, these nonparametric
continuous hypothesized distributions results in con-
goodness-of-fit tests are intended for continuous hy-
servative p-values when the null distribution is dis-
pothesized distributions, but they have also been
continuous (see Slakter (1965), Goodman (1954), and
adapted for discrete distributions. Unfortunately,
Massey (1951)). In the early 1970s, Conover (1972)
most modern statistical software packages and pro-
developed the method implemented here for comput-
gramming environments have failed to incorporate
ing exact one-sided p-values in the case of discrete
these discrete versions. As a result, researchers would
null distributions. The method developed in Gleser
typically rely upon the 2 test or a nonparametric
(1985) is used to provide exact p-values for two-sided
test designed for a continuous null distribution. For
tests.
smaller sample sizes, in particular, both of these
choices can produce misleading inferences.
Implementation
This paper presents a revision of Rs ks.test()
function and a new cvm.test() function to fill this The implementation of the discrete Kolmogorov-
void for researchers and practitioners in the R environ- Smirnov test involves two steps. First, the particular
ment. This work was motivated by the need for such test statistic is calculated (corresponding to the de-
goodness-of-fit testing in a study of Olympic figure sired one-sided or two-sided test). Then, the p-value
skating scoring (Emerson and Arnold, 2011). We first for that particular test statistic may be computed.
present overviews of the theory and general imple- The form of the test statistic is the same as in the
mentation of the discrete Kolmogorov-Smirnov and continuous case; it would seem that no additional
Cramr-von Mises tests. We discuss the particular im- work would be required for the implementation, but
plementation of the tests in R and provide examples. this is not the case. Consider two non-decreasing func-
We conclude with a short discussion, including the tions f and g, where the function f is a step function
state of existing continuous and two-sample Cramr- with jumps on the set { x1 , . . . x N } and g is continu-
von Mises testing in R. ous (the classical Kolmogorov-Smirnov situation). In

The R Journal Vol. 3/2, December 2011 ISSN 2073-4859


C ONTRIBUTED R ESEARCH A RTICLES 35

order to determine the supremum of the difference Cramr-von Mises tests


between these two functions, notice that
Overview
sup | f ( x ) g( x )| While the Kolmogorov-Smirnov test may be the most
x popular of the nonparametric goodness-of-fit tests,
Cramr-von Mises tests have been shown to be more
 
= max max | g( xi ) f ( xi )| , powerful against a large class of alternatives hypothe-
i
 ses. The original test was developed by Harald
lim | g( x ) f ( xi1 )| (2) Cramr and Richard von Mises (Cramr, 1928; von
x xi
  Mises, 1928) and further adapted by Anderson and
Darling (1952), and Watson (1961). The original test
= max max | g( xi ) f ( xi )| ,
i statistic, W 2 , Andersons A2 , and Watsons U 2 are:
 Z
| g( xi ) f ( xi1 )| (3) W2 = n [ Fdata ( x ) F0 ( x )]2 dF0 ( x ) (5)

[ Fdata ( x ) F0 ( x )]2
Z
A2 = n dF0 ( x ) (6)
Computing the maximum over these 2N values (with F0 ( x ) F0 ( x )2
Z h i2
f equal to Fdata ( x ) and g equal to F0 ( x ) as defined
U2 = n Fdata ( x ) F0 ( x ) W 2 dF0 ( x ) (7)
above) is clearly the most efficient way to compute
the Kolmogorov-Smirnov test statistic for a contin-
As with the original Kolmogorov-Smirnov test statis-
uous null distribution. When the function g is not
tic, these all have test statistic null distributions which
continuous, however, equality (3) does not hold in
are independent of the hypothesized continuous mod-
general because we cannot replace limx xi g( x ) with
els. The W 2 statistic was the original test statistic. The
the value g( xi ).
A2 statistic was developed by Anderson in the pro-
If it is known that g is a step function, it follows cess of generalizing the test for the two-sample case.
that for some small e, Watsons U 2 statistic was developed for distributions
which are cyclic (with an ordering to the support but
no natural starting point); it is invariant to cyclic re-
sup | f ( x ) g( x )| = ordering of the support. For example, a distribution
x
on the months of the year could be considered cyclic.
max (| g( xi ) f ( xi )| , | g( xi e) f ( xi1 )|) (4) It has been shown that these tests can be more
i
powerful than Kolmogorov-Smirnov tests to certain
deviations from the hypothesized distribution. They
where the discontinuities in g are more than some dis- all involve integration over the whole range of data,
tance e apart. This, however, requires knowledge that rather than use of a supremum, so they are best-suited
g is a step function as well as of the nature of its sup- for situations where the true alternative distribution
port (specifically, the break-points). As a result, we deviates a little over the whole support rather than
implement the Kolmogorov-Smirnov test statistic for having large deviations over a small section of the
discrete null distributions by requiring the complete support. Stephens (1974) offers a comprehensive anal-
specification of the null distribution. ysis of the relative powers of these tests.
Generalizations of the Cramr-von Mises tests to
Having obtained the test statistic, the p-value must
discrete distributions were developed in Choulakian
then be calculated. When an exact p-value is required
et al. (1994). As with the Kolmogorov-Smirnov test,
for smaller sample sizes, the methodology in Conover
the forms of the test statistics are unchanged, and
(1972) is used in for one-sided tests. For two-sided
the null distributions of the test statistics are again
tests, the methods presented in Gleser (1985) lead to
hypothesis-dependent. Choulakian et al. (1994) does
exact two-sided p-values. This requires the calcula-
not offer finite-sample results, but rather shows that
tion of rectangular probabilities for uniform order
the asymptotic distributions of the test statistics un-
statistics as discussed by Niederhausen (1981). Full
der the null hypothesis each involve consideration of
details of the calculations are contained in source code
a weighted sum of independent chi-squared variables
of our revised function ks.test() and in the papers
(with the weights depending on the particular null
of Conover and Gleser.
distribution).
For larger sample sizes (or when requested for
smaller sample sizes), the classical Kolmogorov- Implementation
Smirnov test is used and is known to produce conser-
vative p-values for discrete distributions; the revised Calculation of the three test statistics is done using
ks.test() supports estimation of p-values via simu- the matrix algebra given by Choulakian et al. (1994).
lation if desired. The only notable difficulty in the implementation of

The R Journal Vol. 3/2, December 2011 ISSN 2073-4859


36 C ONTRIBUTED R ESEARCH A RTICLES

the discrete form of the tests involves calculating the cal instability. First, consider the following inequality:
percentiles of the weighted sum of chi-squares,
p
!
p
!

p P i 21 x P max 21 x (10)
Q= i 2i,1d f (8) i =1

i =1

i =1 x
= P 2p (11)
p max
where p is the number of elements in the support of
the hypothesized distribution. Imhof (1961) provides The values for the weighted sum can be bounded
a method for obtaining the distribution of Q, easily using a simple transformation and a chi-squared dis-
adapted for our case because the chi-squared vari- tribution of a higher degree of freedom. Second, con-
ables have only one degree of freedom. The exact sider the Markov inequality:
formula given for the distribution function of Q is p
!
given by P i 21 x
i =1
1 1 sin [ (u, x )]
Z
 p
P{ Q x } =
 
+ du (9)
2 0 u(u) E exp t i Zi2 exp(tx ) (12)
i =1
for continuous functions (, x ) and () depending exp(tx )
=q (13)
on the weights i . p
i=1 (1 2ti )
There is no analytic solution to the integral in (9),
so the integration is accomplished numerically. This where the bound can be minimized over t
seems fine in most situations we considered, but nu- (0, 1/2max ). The upper bounds for the p-value given
merical issues appear in the regime of large test statis- by (11) and (13) are both calculated and the smaller
tics x (or, equivalently, small p-values). The function is used in cases where the numerical instability of (9)
(, x ) is linear in x; as the test statistic grows the cor- may be a concern.
responding periodicity of the integrand decreases and The original formulation, numerical integration of
the approximation becomes unstable. As an example (9), is preferable for most p-values, while the upper
of this numerical instability, the red plotted in Figure bound described above is used for smaller p-values
1 shows the non-monotonicity of the numerical eval- (smaller than 0.001, based on our observations of the
uation of equation (9) for a null distribution that is numerical instability of the original formulation). Fig-
uniform on the set {1, 2, 3}. ure 1 shows the bounds with the blue and green
dashed lines; values in red exceeding the bounds
are a result of the numerical instability. Although
Unstable asymptotic pvalue from (9) it would be preferable to determine the use of the
Bound given by (11)
bound based on values of the test statistic rather than
1e03

Bound given by (13)

the p-value, the range of extreme values of the test


statistic varies with the hypothesized distribution.
5e04
pvalue

Kolmogorov-Smirnov and Cramr-


0e+00

von Mises tests in R


Functions ks.test() and cvm.test() are provided
1.5 2.0 2.5 3.0 for convenience in package dgof, available on
test statistic CRAN. Function ks.test() offers a revision of Rs
Kolmogorov-Smirnov function ks.test() from rec-
ommended package stats; cvm.test() is a new func-
Figure 1: Plot of calculated p-values for given test tion for Cramr-von Mises tests.
statistics using numerical integration (red) compared The revised ks.test() function supports one-
to the conservative chi-squared bound (dashed blue) sample tests for discrete null distributions by allowing
and the Markov inequality bound (dashed green). the second argument, y, to be an empirical cumula-
The null distribution is uniform on the set {1, 2, 3} in tive distribution function (an R function with class
this example. The sharp variations in the calculated "ecdf") or an object of class "stepfun" specifying a
p-values are a result of numerical instabilities, and discrete distribution. As in the original version of
the true p-values are bounded by the dashed curves. ks.test(), the presence of ties in the data (the first
argument, x) generates a warning unless y describes a
We resolve this problem by using a combination of discrete distribution. If the sample size is less than or
two conservative approximations to avoid the numeri- equal to 30, or when exact=TRUE, exact p-values are

The R Journal Vol. 3/2, December 2011 ISSN 2073-4859


C ONTRIBUTED R ESEARCH A RTICLES 37

provided (a warning is issued when the sample size and show usage of the new ks.test() implementa-
is greater than 30 due to possible numerical instabili- tion. The first is the default two-sided test, where the
ties). When exact = FALSE (or when exact is unspec- exact p-value is obtained using the methods of Gleser
ified and the sample size is greater than 30) the classi- (1985).
cal Kolmogorov-Smirnov null distribution of the test
> set.seed(1)
statistic is used and resulting p-values are known to
> x <- sample(1:10, 25, replace = TRUE)
be conservative though imprecise (see Conover (1972) > x
for details). In such cases, simulated p-values may
be desired, produced by the simulate.p.value=TRUE [1] 3 4 6 10 3 9 10 7 7 1 3 2 7
option. [14] 4 8 5 8 10 4 8 10 3 7 2 3
The function cvm.test() is similar in design to
> dgof::ks.test(x, ecdf(1:10))
ks.test(). Its first two arguments specify the data
and null distribution; the only extra option, type, One-sample Kolmogorov-Smirnov test
specifies the variant of the Cramr-von Mises test:
data: x
x a numerical vector of data values. D = 0.08, p-value = 0.9354
alternative hypothesis: two-sided
y an ecdf or step-function (stepfun) for specifying
the null model Next, we conduct the default one-sided test, where
Conovers method provides the exact p-value (up to
type the variant of the Cramr-von Mises test; W2 is
the numerical precision of the implementation):
the default and most common method, U2 is for
cyclical data, and A2 is the Anderson-Darling > dgof::ks.test(x, ecdf(1:10),
alternative. + alternative = "g")

As with ks.test(), cvm.test() returns an object of One-sample Kolmogorov-Smirnov test


class "htest".
data: x
D^+ = 0.04, p-value = 0.7731
Examples alternative hypothesis:
the CDF of x lies above the null hypothesis
Consider a toy example with observed data of length
In contrast, the option exact=FALSE results in the p-
2 (specifically, the values 0 and 1) and a hypothesized
value obtained by applying the classical Kolmogorov-
null distribution that places equal probability on the
Smirnov test, resulting in a conservative p-value:
values 0 and 1. With the current ks.test() function in
R (which, admittedly, doesnt claim to handle discrete > dgof::ks.test(x, ecdf(1:10),
distributions), the reported p-value, 0.5, is clearly in- + alternative = "g", exact = FALSE)
correct:
One-sample Kolmogorov-Smirnov test
> stats::ks.test(c(0, 1), ecdf(c(0, 1)))
data: x
One-sample Kolmogorov-Smirnov test D^+ = 0.04, p-value = 0.9231
alternative hypothesis:
data: c(0, 1) the CDF of x lies above the null hypothesis
D = 0.5, p-value = 0.5
alternative hypothesis: two-sided The p-value may also be estimated via a Monte Carlo
simulation:
Instead, the value of D given in equation (1) should > dgof::ks.test(x, ecdf(1:10),
be 0 and the associated p-value should be 1. Our re- + alternative = "g",
vision of ks.test() fixes this problem when the user + simulate.p.value = TRUE, B = 10000)
provides a discrete distribution:
One-sample Kolmogorov-Smirnov test
> library(dgof)
> dgof::ks.test(c(0, 1), ecdf(c(0, 1))) data: x
D^+ = 0.04, p-value = 0.7717
One-sample Kolmogorov-Smirnov test alternative hypothesis:
the CDF of x lies above the null hypothesis
data: c(0, 1)
D = 0, p-value = 1 A different toy example shows the dangers of
alternative hypothesis: two-sided using Rs existing ks.test() function with discrete
data:
Next, we simulate a sample of size 25 from the dis-
crete uniform distribution on the integers {1, 2, . . . , 10} > dgof::ks.test(rep(1, 3), ecdf(1:3))

The R Journal Vol. 3/2, December 2011 ISSN 2073-4859


38 C ONTRIBUTED R ESEARCH A RTICLES

One-sample Kolmogorov-Smirnov test be exact. In others, conservativeness in special cases


with small p-values has been established. Although
data: rep(1, 3) we provide for Monte Carlo simulated p-values with
D = 0.6667, p-value = 0.07407 the new ks.test(), no simulations may be neces-
alternative hypothesis: two-sided sary necessary for these methods; they were gener-
ally developed during an era when extensive simula-
If, instead, either exact=FALSE is used with
tions may have been prohibitively expensive or time-
the new ks.test() function, or if the original
consuming. However, this does raise the possibility
stats::ks.test() is used, the reported p-value is
that alternative tests relying upon modern computa-
0.1389 even though the test statistic is the same.
tional abilities could provide even greater power in
We demonstrate the Cramr-von Mises tests with
certain situations, a possible avenue for future work.
the same simulated data.
In the continuous setting, both of the Kolmogorov-
> cvm.test(x, ecdf(1:10)) Smirnov and the Cramr-von Mises tests have two-
sample analogues. When data are observed from two
Cramer-von Mises - W2 processes or sampled from two populations, the hy-
pothesis tested is whether they came from the same
data: x
(unspecified) distribution. With the discrete case,
W2 = 0.057, p-value = 0.8114
alternative hypothesis: Two.sided however, the null distribution of the test statistic de-
pends on the underlying probability model, as dis-
> cvm.test(x, ecdf(1:10), type = "A2") cussed by Walsh (1963). Such an extension would
require the specification of a null distribution, which
Cramer-von Mises - A2 generally goes unstated in two-sample goodness-of-
fit tests. We note that Dufour and Farhat (2001) ex-
data: x
plored two-sample goodness-of-fit tests for discrete
A2 = 0.3969, p-value = 0.75
alternative hypothesis: Two.sided
distributions using a permutation test approach.
Further generalizations of goodness-of-fit tests for
We conclude with a toy cyclical example showing discrete distributions are described in the extended
that the test is invariant to cyclic reordering of the study of de Wet and Venter (1994). There are exist-
support. ing R packages for certain type of Cramr-von Mises
goodness-of-fit tests for continuous distributions.
> set.seed(1) Functions implemented in package nortest (Gross,
> y <- sample(1:4, 20, replace = TRUE) 2006) focus on the composite hypothesis of normality,
> cvm.test(y, ecdf(1:4), type = 'U2')
while package ADGofTest (Bellosta, 2009) provides
Cramer-von Mises - U2 the Anderson-Darling variant of the test for general
continuous distributions. Packages CvM2SL1Test
data: y (Xiao and Cui, 2009a) and CvM2SL2Test (Xiao and
U2 = 0.0094, p-value = 0.945 Cui, 2009b) provide two-sample Cramr-von Mises
alternative hypothesis: Two.sided tests with continuous distributions. Package cramer
(Franz, 2006) offers a multivariate Cramr test for the
> z <- y%%4 + 1 two-sample problem. Finally, we note that the dis-
> cvm.test(z, ecdf(1:4), type = 'U2')
crete goodness-of-fit tests discussed in this paper do
Cramer-von Mises - U2 not allow the estimation of parameters of the hypoth-
esized null distributions (see Lockhart et al. (2007) for
data: z example).
U2 = 0.0094, p-value = 0.945
alternative hypothesis: Two.sided
Acknowledgement
In contrast, the Kolmogorov-Smirnov or the standard
Cramr-von Mises tests produce different results after We are grateful to Le-Minh Ho for his implementation
such a reordering. For example, the default Cramr- of the algorithm described in Niederhausen (1981) for
von Mises test yields p-values of 0.8237 and 0.9577 calculating rectangular probabilities of uniform order
with the original and transformed data y and z, re- statistics. We also appreciate the feedback from two
spectively. reviewers of this paper as well as from a reviewer of
Emerson and Arnold (2011).

Discussion
Bibliography
This paper presents the implementation of several
nonparametric goodness-of-fit tests for discrete null T. W. Anderson and D. A. Darling. Asymptotic the-
distributions. In some cases the p-values are known to ory of certain goodness of fit criteria based on

The R Journal Vol. 3/2, December 2011 ISSN 2073-4859


C ONTRIBUTED R ESEARCH A RTICLES 39

stochastic processes. Annals of Mathematical Statis- R. Lockhart, J. Spinelli, and M. Stephens. Cramr-von
tics, 23:193212, 1952. mises statistics for discrete distributions with un-
known parameters. Canadian Journal of Statistics, 35
C. J. G. Bellosta. ADGofTest: Anderson-Darling GoF test, (1):125133, 2007.
2009. URL http://CRAN.R-project.org/package=
ADGofTest. R package version 0.1. F. Massey. The Kolmogorov-Smirnov test for good-
ness of fit. Journal of the American Statistical Associa-
V. Choulakian, R. A. Lockhart, and M. A. Stephens.
tion, 46:6878, 1951.
Cramr-von Mises statistics for discrete distribu-
tions. The Canadian Journal of Statistics, 22(1):125 H. Niederhausen. Sheffer polynomials for computing
137, 1994. exact Kolmogorov-Smirnov and Rnyi type distri-
W. J. Conover. A Kolmogorov goodness-of-fit test for butions. The Annals of Statistics, 9(5):923944, 1981.
discontinuous distributions. Journal of the American ISSN 0090-5364.
Statistical Association, 67(339):591596, 1972.
M. J. Slakter. A comparison of the Pearson chi-square
H. Cramr. On the composition of elementary errors. and Kolmogorov goodness-of-fit tests with respect
Skand. Akt., 11:141180, 1928. to validity. Journal of the American Statistical Associa-
tion, 60(311):854858, 1965.
T. de Wet and J. Venter. Asymptotic distributions
for quadratic forms with applications to tests of fit. M. A. Stephens. Edf statistics for goodness of fit and
Annals of Statistics, 2:380387, 1994. some comparisons. Journal of the American Statistical
Association, 69(347):730737, 1974.
J.-M. Dufour and A. Farhat. Exact nonparametric
two-sample homogeneity tests for possibly discrete R. E. von Mises. Wahrscheinlichkeit, Statistik und
distributions. Unpublished manuscript, October Wahrheit. Julius Springer, Vienna, Austria, 1928.
2001. URL http://hdl.handle.net/1866/362.
J. E. Walsh. Bounded probability properties for
J. W. Emerson and T. B. Arnold. Statistical sleuthing Kolmogorov-Smirnov and similar statistics for dis-
by leveraging human nature: A study of olympic crete data. Annals of the Institute of Statistical Mathe-
figure skating. The American Statistician, 2011. To matics, 15:153158, 1963.
appear.
G. S. Watson. Goodness of fit tests on the circle.
C. Franz. cramer: Multivariate nonparametric Cramer-
Biometrika, 48:109114, 1961.
Test for the two-sample-problem, 2006. R package ver-
sion 0.8-1. Y. Xiao and Y. Cui. CvM2SL1Test: L1-version of Cramer-
L. J. Gleser. Exact power of goodness-of-fit tests of von Mises Two Sample Tests, 2009a. R package ver-
kolmogorov type for discontinuous distributions. sion 0.0-2.
Journal of American Statistical Association, 80(392):
Y. Xiao and Y. Cui. CvM2SL2Test: Cramer-von Mises
954958, 1985.
Two Sample Tests, 2009b. R package version 0.0-2.
L. A. Goodman. Kolmogorov-Smirnov tests for psy-
chological research. Psychological Bulletin, 51:160
168, 1954. Taylor B. Arnold
Yale University
J. Gross. nortest: Tests for Normality, 2006. R package 24 Hillhouse Ave.
version 1.0. New Haven, CT 06511 USA
J. P. Imhof. Computing the distribution of quadratic taylor.arnold@yale.edu
forms in normal variables. Biometrika, 48:419426,
1961. John W. Emerson
Yale University
A. Kolmogorov. Sulla determinazione empirica di una 24 Hillhouse Ave.
legge di distribuzione. Giornale dell Instuto Italiano New Haven, CT 06511 USA
degli Attuari, 4:8391, 1933. john.emerson@yale.edu

The R Journal Vol. 3/2, December 2011 ISSN 2073-4859