You are on page 1of 8

Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square

and Kolmogorov-Smirnov Tests

Comparison of the Goodness-of-Fit Tests: the


Pearson Chi-square and Kolmogorov-Smirnov Tests
Hsiao-Mei wang
Department of Information Management,
Ling Tung University
No. 1, Ling Tung Rd, Taichung City, 407
Taiwan
TEL:+886-4-23892088#9820
E-mail: hmwang@teamail.ltu.edu.tw

Abstract- A test for goodness of fit usually involves 1. Introduction


examining a random sample from some unknown
distribution in order to test the null hypothesis that the The statistical procedure that is used to test
unknown distribution function is in fact a known, whether an assumed distribution is correct is called
specified function. The Chi-square test can be applied the goodness-of-fit test. Many related techniques has
to any univariate distribution for which you can been developed and based on either the cumulative
calculate the cumulative distribution function. The distribution function (CDF) or the probability density
Chi-square test does not have good properties (power function (PDF). Tests based on the CDF are called
and type I error rate) for small sample sizes. The “distance tests” while tests based on the PDF are
Kolmogorov-Smirnov goodness-of-fit test (KS test) is called “area tests” [1,2]. The Chi-square (x2) test is an
alternative to the Chi-square test. This paper area test and the Kolmogorov-Smirnov test (KS test)
summarizes the results of a Monte-Carlo investigation is a distance test. Note that after the data has been
of the Type I error rate and power of both tests. In grouped, the Chi-square test ignores the ordering of
every case examined where a 0.05 significance level the different groups. This is also true for other “area
was targeted, the simulated type I error rate of the KS tests”.
test is smaller than that of the Chi-square test. The The oldest and best-known goodness-of-fit test
power of the Chi-square test is smaller than the KS presented by Pearson is the Chi-square test for
test except when the distribution in the null hypothesis goodness of fit [3]. In 1933, Kolmogorov introduced
has a given mean (regardless of the spread of the the KS test which furnished us with an alternative to
distribution). Therefore, the KS test is more valid the Chi-square test for goodness of fit. It enables us
than the Chi-square test in general when a 0.05 to form a “confidence band” for the unknown
significance level was targeted. distribution function [4]. However, the Chi-square
test can perform poorly for small sample sizes due to
Keywords: Chi-square test, Goodness-of-fit test, its test statistic not having an approximate chi-squared
Kolmogorov-Smirnov test. distribution under the null hypothesis. Many articles
have discussed the criteria needed to apply the

57
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64

asymptotic Chi-square distribution [5, 6, 7]. The KS Hence, the expected number (Ei) for the ith interval
test may be preferred over the Chi-square test for under the null hypothesis is true is nPi, where n is the
goodness of fit when the sample size is small. An number of sample size and Pi equals F(x i+1) - F(x i).
exact version of the KS test can be applied for small The x2 Statistic is:
samples [8]. For testing hypotheses based on the k
(Oi − E i ) 2
selected grouping, an exact test using the test statistic x2 = ∑
i =1 Ei
of the Chi-square test can also readily be applied for
small sample sizes. which is assumed to have a Chi-square distribution
with v degrees of freedom ( x v2 ) where v = k-1-the
Slakter [9] compared these two tests for small
number of estimatable parameters. The critical
sample sizes, n, and a small number of groups of data,
region for the test is x 2 ≥ xv2,α where x v2 is selected
k (i.e., when n and k are both ≤ 50). Slakter found
that the Chi-square test maintained a type I error rate so that the asymptotic probability that x 2 ≥ xv2,α is α
closer to the level than the KS test. The KS test will
under the null hypothesis. The power of the test is
err in the “safe direction” when k is finite [10,11]
the probability of rejecting H 0 : F ( x) = F0 ( x) when
(maintain a type I error less than desired). Generally,
F ( x) ≠ F0 ( x) .
the power of the Chi-square test is not known [12].
However, Messy [11] suggests that the KS test may be
always more powerful than the Chi-square test. In 1.2 The Kolmogorov-Smirnov Test (KS Test)
fact, there is no research involved in comparing both The Kolmogorov’s Statistic is defined as
tests refer to their type I error rate and power at the d = Max | F ( x) − E ( x ) |
same time. where F(x) and E(x) are the theoretical and empirical

The goal of this paper is to determine which test distribution functions evaluated at x, respectively.

is more valid with respect to controlling the desired These two functions are evaluated at xi are defined as

type I error rate and power. F(x i)= P ( X ≤ x i )

and
# of X ' s ≤ xi i
1.1 The Chi-square (x2) Goodness-of-Fit Test
E(x i) = =
n n,
Let F(x) denote the distribution function of the
i=1,2,…,n. If the observed maximum departure d is
continuous random variable X. The null hypothesis of
small, then the assumed F(x) may be reasonable as
the “goodness-of-fit” test is given as
that distribution that generated the data. But if this d
H 0 : F ( x) = F0 ( x) is “large” then it is unlikely that F(x) is the
underlying data distribution.
where F0(x) is some specified cumulative frequency
The critical region for the KS test is
distribution function. In order to apply the x2 test we
d ≥ CV (α , n) and the probability of d ≥ CV (α , n) is
have to divide the data range of X into k subintervals.
α. The critical values, CV (α , n ) , shown in Table 1
Then count the number (Oi) of data points in each
subinterval with endpoints xi, xi+1 for the ith interval. were calculated by the algorithm provided in several
texts [8, 13, 14]. These critical values are valid when

58
Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square
and Kolmogorov-Smirnov Tests

the distribution parameters are known. When the the sample sizes (n=20, 30, 50, 100, 200, 300, 400,
parameters are estimated from the data, these critical 500) and the number of intervals (k=5, 10).
values will be approximate. Since the parameters of Simulations used either the Normal distribution
the distribution are usually unknown (and needed to (mean=0 and standard deviation=1), the Exponential
be estimated), the use of CV ( 4α , n ) as the critical distribution (5; mean=0.2), or Weibull distribution

value when α is very small is one adaptive procedure (10,2; mean=1.903 and standard deviation=0.2289)

[15] when implementing the KS test. as the true underlying distribution. In different cases,
these distributions were used as the null distribution
or the alternative distribution (with the null
parameter(s) being estimated). The alternative

Table 1 Critical values, CV (α , n ) , of the KS test distribution for calculating power of both tests is the
with sample size n at the different levels of α. Normal distribution having the same mean and
Level of significance (α) standard deviation as the simulated data from the
n 0.40 0.20 0.10 0.05 0.04 0.01 Weibull and Exponential distributions, respectively.
5 0.369 0.447 0.509 0.562 0.580 0.667 When the data are generated from a Normal
10 0.268 0.322 0.368 0.409 0.422 0.487 distribution, the null distribution is normal with the
20 0.192 0.232 0.264 0.294 0.304 0.352 same variance but a different mean.
30 0.158 0.190 0.217 0.242 0.250 0.290 In the simulation, equal-probability groups are used
50 0.123 0.149 0.169 0.189 0.194 0.225 for the Chi-square test only. For the KS test, the
0.87 1.07 1.22 1.36 1.37 1.63 parameter(s) are assumed unknown and needed to be
>50
n n n n n n
estimated. The target type I error rate of 0.05 is used
for all simulations.

3. Results
2. Simulation Studies

In Table 2, the type I error rate for the KS test is


Simulation studies show that the null
always smaller than 0.06 for the data generated from
distributions of the test statistics are independent of
the Exponential distribution. The values of the type I
parameter values. When testing the goodness of fit
error rate are even smaller (less than 0.01) when the
of a given Weibull distribution the type I error rate
data are generated from either a Normal or Weibull
for a KS test does not depend on the values for the
distributions. In general for each test and underlying
Weibull parameters. In order to have the Chi-square
distribution, the values of the type I error rate varied
test statistic have an approximate Chi-square
slightly across sample sizes. In every case, the KS
distribution under the null hypothesis, a large sample
test had a smaller simulated type I error rate than the
size is required.
Chi-square test. For the Chi-square test when the
For each evaluation of the type I error rate and
number of the intervals is five, the type I error rates
the power for the Chi-square test, ten thousand
are between 0.079 and 0.138 and when the number of
simulations were conducted for each combination for
intervals is ten, the type I error rates are between

59
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64

0.062 and 0.083. There was a noticeable size is needed to have power greater than 80%
improvement in simulated type I error rates for the (n=500).
Chi-square test when increasing the number of For the cases studied in Fig. 1 (n=200), the KS
intervals from 5 to 10. test always has a better power than the Chi-square
In Table 3, data were generated from different test while the mean of the null distribution is different
distributions in the alternative hypotheses (H1) and from the mean of the alternative distribution. That is,
can be tested for normality (H0) by both KS and Chi- under the following three combinations of the test
square tests. The power of the Chi-square test hypotheses: (1) H0: Normal(0+shift,4) versus H1:
behaves differently to the number of intervals and H1. Normal(0,4); (2) H0: Exponential(5+shift) versus H1:
The power of the Chi-square test is larger when we Exponential(5); and (3) H0: Weibull (10,2+shift)
grouped the symmetric or almost symmetric data (e.g., versus H1: Weibull(10,2), the KS test has a greater
Normal(0,1) or Weibull(10,2)) into five intervals than power. However, when the data are from the
that of ten intervals under the small sample size Weibull distribution with shifted values of the shape
(n ≤ 50). For the larger sample sizes (n>50), the parameter and a fixed scale value (i.e.,
power of the Chi-square test with k=10 is always H0:Weibull(10+shift,2) versus H1:Weibull(10,2)),
equal to or larger than that of k=5 regardless the the KS test has less power than the Chi-square tests
shapes of the underlying distribution. When the data (k=5,10).
are from a Normal(0,1) distribution and tested with
the same distribution but shifted mean 0.5+ x , the 4. Conclusions
power of the KS test at n=50 is 100% and the power
for the Chi-square tests are 99.8% when k=5 and
In the real world, the parameters of the
89.7% when k=10, respectively. When the data are
distribution are usually unknown and need to be
from an Exponential(5) distribution with sample size
estimated. When the parameters are estimated from
fifty, the power of the KS test is 82.0% and the
the data, it affects the power of the KS test.
power of the Chi-square tests is 72.7% when k=5 and
Moreover, the KS test does not perform well in the
98.3% when k=10. However, the KS test performs
sense of power; when the data are tested with the
better or closely to the Chi-square test (k=10) when n
same type of distribution with a similar mean with
is greater than or equal to 100. When the data are
less power than the Chi-square test. In additions, the
from a Weibull(10,2) distribution with sample size
KS test has superior power to the Chi-square test
fifty, the power of the KS test is 5.99% and the
when the sample size is large which consistent with
power of the Chi-square tests are 13.87% when k=5
Massey’s notes [16]. In all cases studied, the KS test
and 12.21% when k=10. That is, when the sample
always has a smaller type I error rate than the Chi-
size is small (n ≤ 50), the Chi-square test has greater square test.
power than the KS test. For sample sizes greater than For the Chi-square test, as the number of
200, the KS test has greater power than the Chi- intervals increased, the type I error rate and power
square test. Since the Weibull(10,2) distribution is decreased. In general, the computing time of the KS
very similar to a normal distribution, a larger sample test is longer than that of the Chi-square test. Besides,

60
Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square
and Kolmogorov-Smirnov Tests

the limit of this study is empirical and not [7] M. J. Slakter, “Large values for the number of
mathematical. groups with the Pearson Chi-squared goodness-of-
fit test,” Biometrika, 60, pp.420-421, 1973.
[8] W. J. Conover, Practical Nonparametric
Acknowledgement Statistics, Second Edition, John Wiley & Sons,
New York, 1980.

This research was supported by the National [9] M. J. Slakter, “A comparison of the Pearson Chi-

Science Council grant NSC 96-2118-M-275-001. square and Kolmogorov goodness-of-fit tests
with respect to validity,” Journal of the
American Statistical Association, 60, pp.854-858
(6.1), 1965.
References [10] L. A. Goodman, ” Kolmogorov-Smirnov tests
for psychological research,.” Psychological
Bulletin, 51, pp.160-168, 1954.
[1] J. L. Romeu, and C. Grethlein,. A Practical Guide
[11] F. J., Jr. Massey, “The Kolmogorov-Smirnov
to Statistical Analysis of Material Property Data,
test for goodness of fit,” Journal of the American
AMPTIAC, 2000.
Statistical Association, 46, pp.68-78, 1951.
[2] R. Walpole,R. Myers, S. Myers, and K. Ye,
[12] H. B. Mann, and A. Wald, “ On the choice of the
Probability and Statistics for Engineers and
number of intervals in the application of the chi-
Scientists, 8th Edition, Prentice Hall, NJ., 2007.
square test,” Annals of Mathematical Statistics,
[3] K. Pearson, “On the criterion that a given system
13, pp.306-317, 1942.
of deviations from the probable in the case of
[13] V. K. Rohatgi, An Introduction to Probability
correlated system of variables is such that it can
Theory and Mathematical Statistics, Wiley, NY,
reasonably be supposed to have arisen from
1976.
random sampling,” Philosophical Magazine (5),
[14] N. Mann, Schafer, and N. Singpurwalla,
50, 157-175 (4.5), 1900.
Methods for Statistical Analysis of Reliability
[4] A. N. Kolmogorov, “Sulla determinazione
and Life Data, John Wiley, NY, 1974.
empirca di una legge di distribuzione,” Giornale
[15] J. L. Romeu,. Kolmogorov-Smirnov: A
dell’ Istituto Italiano degli Attuari, 4, pp.83-91,
Goodness-of-Fit Test for Small Samples. RAC
1933.
START, volume 10, number 6, 2003.
[5] W. G. Cochran, “ The x2 test of goodness of fit,”
[16]. F. J., Jr. .Massey,, “ A note on the power of a
Annals of Mathematical Statistics, 23, pp.315-
non-parametric test,” Annals of Mathematical
345, 1952.
Statistics, 21, pp.440-443, 1950.
[6] J. K. Yarnold, “The minimum expectations in x2
goodness to fit tests and the accuracy of
approximations for the null distribution,”
Journal of the American Statistical Association,
65, pp.865-886, 1970.

61
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64

Table 2 The Type I error rate for both tests (x2 and KS) at α=0.05. Data are generated from the
null hypothesis (H0) with different sample sizes (n) and numbers of intervals (k) for
10,000 simulations.
H0
Normal(0,1) Exponential(5) Weibull(10,2)
x2 x2 x2
n (k=5) (k=10) KS (k=5) (k=10) KS (k=5) (k=10) KS
20 - - 0.006 - - 0.054 - - 0.005
30 0.079 - 0.007 0.138 - 0.057 0.077 - 0.006
50 0.070 0.052 0.006 0.128 0.075 0.054 0.078 0.052 0.005
100 0.073 0.054 0.007 0.126 0.080 0.050 0.077 0.051 0.005
200 0.079 0.062 0.006 0.123 0.083 0.052 0.075 0.055 0.006

Table3 The Power (%) for Both Tests (x2 and KS) under the Null Hypothesis (H0) at α=0.05.
Data are Generated from the Alternative Hypothesis (H1) with Different Sample Sizes (n)
and Numbers of Intervals (k) for 10,000 Simulations.
H0: Normal Distribution having the same mean
H0: Normal(0.5,1) and variance as in H1
H1: Normal (0,1) H1: Exponential(5) H1:Weibull(10,2)
x2 x2 x2
n (k=5) (k=10) KS (k=5) (k=10) KS (k=5) (k=10) KS
20 - - 88.25 - - 27.56 - - 2.09
30 88.87 - 98.60 56.78 - 50.25 10.91 - 3.13
50 99.78 89.70 100.00 72.69 98.32 82.01 13.87 12.21 5.99
100 100.00 100.00 100.00 93.98 100.00 99.72 19.23 20.07 13.38
200 100.00 100.00 100.00 99.87 100.00 100.00 30.32 37.48 34.24
300 100.00 100.00 100.00 100.00 100.00 100.00 41.97 53.56 56.29
400 100.00 100.00 100.00 100.00 100.00 100.00 54.36 67.57 73.33
500 100.00 100.00 100.00 100.00 100.00 100.00 66.00 79.27 85.69

62
Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square
and Kolmogorov-Smirnov Tests

Fig. 1. Power for Both Chi-square (k=5,10) and KS Tests at α=0.05 and n=200. Data are Generated from
the Same Distribution as H0 but Without the Shifted value under 10,000 Simulations.

63
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64

適合度檢定之比較研究-卡方檢定和 Kolmogorov-Smirnov
檢定法

王曉玫
嶺東科技大學 資訊管理系
台中市嶺東路一號
TEL:04-23892088 轉 9820
E-mail: hmwang@teamail.ltu.edu.tw

摘要

適合度檢定通常用於檢定一個未知分配的隨機樣本,檢查其是否服從虛無假設所設定的已知且特定

的分配函數。在適合度檢定中,卡方檢定可以被應用於任何能計算累積分配函數的單變量分配;當小

樣本時,卡方檢定沒有好的性質 (檢定力和型 I 誤差率)。Kolmogorov-Smirnov 檢定 (KS 檢定) 則是除了

卡方檢定外的另一個選擇。本研究針對上述兩種檢定法進行 Monte-Carlo 電腦模擬測試,比較此兩種方

法之型 I 誤差率和檢定力。結果顯示,當顯著水準為 0.05 時,KS 檢定之模擬型 I 誤差率在所有狀況下

均會小於卡方檢定。在檢定力的比較中,除了虛無假設為已知平均數的分配 (不論分配的離散大小)

外,卡方檢定的檢定力會小於 KS 檢定。一般而言,當顯著水準為 0.05 時,KS 檢定較卡方檢定有效。

關鍵字:卡方檢定,適合度檢定,Kolmogorov-Smirnov 檢定。

64

You might also like