Professional Documents
Culture Documents
Comparacion Test de Bondad de Ajuste PDF
Comparacion Test de Bondad de Ajuste PDF
57
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64
asymptotic Chi-square distribution [5, 6, 7]. The KS Hence, the expected number (Ei) for the ith interval
test may be preferred over the Chi-square test for under the null hypothesis is true is nPi, where n is the
goodness of fit when the sample size is small. An number of sample size and Pi equals F(x i+1) - F(x i).
exact version of the KS test can be applied for small The x2 Statistic is:
samples [8]. For testing hypotheses based on the k
(Oi − E i ) 2
selected grouping, an exact test using the test statistic x2 = ∑
i =1 Ei
of the Chi-square test can also readily be applied for
small sample sizes. which is assumed to have a Chi-square distribution
with v degrees of freedom ( x v2 ) where v = k-1-the
Slakter [9] compared these two tests for small
number of estimatable parameters. The critical
sample sizes, n, and a small number of groups of data,
region for the test is x 2 ≥ xv2,α where x v2 is selected
k (i.e., when n and k are both ≤ 50). Slakter found
that the Chi-square test maintained a type I error rate so that the asymptotic probability that x 2 ≥ xv2,α is α
closer to the level than the KS test. The KS test will
under the null hypothesis. The power of the test is
err in the “safe direction” when k is finite [10,11]
the probability of rejecting H 0 : F ( x) = F0 ( x) when
(maintain a type I error less than desired). Generally,
F ( x) ≠ F0 ( x) .
the power of the Chi-square test is not known [12].
However, Messy [11] suggests that the KS test may be
always more powerful than the Chi-square test. In 1.2 The Kolmogorov-Smirnov Test (KS Test)
fact, there is no research involved in comparing both The Kolmogorov’s Statistic is defined as
tests refer to their type I error rate and power at the d = Max | F ( x) − E ( x ) |
same time. where F(x) and E(x) are the theoretical and empirical
The goal of this paper is to determine which test distribution functions evaluated at x, respectively.
is more valid with respect to controlling the desired These two functions are evaluated at xi are defined as
and
# of X ' s ≤ xi i
1.1 The Chi-square (x2) Goodness-of-Fit Test
E(x i) = =
n n,
Let F(x) denote the distribution function of the
i=1,2,…,n. If the observed maximum departure d is
continuous random variable X. The null hypothesis of
small, then the assumed F(x) may be reasonable as
the “goodness-of-fit” test is given as
that distribution that generated the data. But if this d
H 0 : F ( x) = F0 ( x) is “large” then it is unlikely that F(x) is the
underlying data distribution.
where F0(x) is some specified cumulative frequency
The critical region for the KS test is
distribution function. In order to apply the x2 test we
d ≥ CV (α , n) and the probability of d ≥ CV (α , n) is
have to divide the data range of X into k subintervals.
α. The critical values, CV (α , n ) , shown in Table 1
Then count the number (Oi) of data points in each
subinterval with endpoints xi, xi+1 for the ith interval. were calculated by the algorithm provided in several
texts [8, 13, 14]. These critical values are valid when
58
Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square
and Kolmogorov-Smirnov Tests
the distribution parameters are known. When the the sample sizes (n=20, 30, 50, 100, 200, 300, 400,
parameters are estimated from the data, these critical 500) and the number of intervals (k=5, 10).
values will be approximate. Since the parameters of Simulations used either the Normal distribution
the distribution are usually unknown (and needed to (mean=0 and standard deviation=1), the Exponential
be estimated), the use of CV ( 4α , n ) as the critical distribution (5; mean=0.2), or Weibull distribution
value when α is very small is one adaptive procedure (10,2; mean=1.903 and standard deviation=0.2289)
[15] when implementing the KS test. as the true underlying distribution. In different cases,
these distributions were used as the null distribution
or the alternative distribution (with the null
parameter(s) being estimated). The alternative
Table 1 Critical values, CV (α , n ) , of the KS test distribution for calculating power of both tests is the
with sample size n at the different levels of α. Normal distribution having the same mean and
Level of significance (α) standard deviation as the simulated data from the
n 0.40 0.20 0.10 0.05 0.04 0.01 Weibull and Exponential distributions, respectively.
5 0.369 0.447 0.509 0.562 0.580 0.667 When the data are generated from a Normal
10 0.268 0.322 0.368 0.409 0.422 0.487 distribution, the null distribution is normal with the
20 0.192 0.232 0.264 0.294 0.304 0.352 same variance but a different mean.
30 0.158 0.190 0.217 0.242 0.250 0.290 In the simulation, equal-probability groups are used
50 0.123 0.149 0.169 0.189 0.194 0.225 for the Chi-square test only. For the KS test, the
0.87 1.07 1.22 1.36 1.37 1.63 parameter(s) are assumed unknown and needed to be
>50
n n n n n n
estimated. The target type I error rate of 0.05 is used
for all simulations.
3. Results
2. Simulation Studies
59
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64
0.062 and 0.083. There was a noticeable size is needed to have power greater than 80%
improvement in simulated type I error rates for the (n=500).
Chi-square test when increasing the number of For the cases studied in Fig. 1 (n=200), the KS
intervals from 5 to 10. test always has a better power than the Chi-square
In Table 3, data were generated from different test while the mean of the null distribution is different
distributions in the alternative hypotheses (H1) and from the mean of the alternative distribution. That is,
can be tested for normality (H0) by both KS and Chi- under the following three combinations of the test
square tests. The power of the Chi-square test hypotheses: (1) H0: Normal(0+shift,4) versus H1:
behaves differently to the number of intervals and H1. Normal(0,4); (2) H0: Exponential(5+shift) versus H1:
The power of the Chi-square test is larger when we Exponential(5); and (3) H0: Weibull (10,2+shift)
grouped the symmetric or almost symmetric data (e.g., versus H1: Weibull(10,2), the KS test has a greater
Normal(0,1) or Weibull(10,2)) into five intervals than power. However, when the data are from the
that of ten intervals under the small sample size Weibull distribution with shifted values of the shape
(n ≤ 50). For the larger sample sizes (n>50), the parameter and a fixed scale value (i.e.,
power of the Chi-square test with k=10 is always H0:Weibull(10+shift,2) versus H1:Weibull(10,2)),
equal to or larger than that of k=5 regardless the the KS test has less power than the Chi-square tests
shapes of the underlying distribution. When the data (k=5,10).
are from a Normal(0,1) distribution and tested with
the same distribution but shifted mean 0.5+ x , the 4. Conclusions
power of the KS test at n=50 is 100% and the power
for the Chi-square tests are 99.8% when k=5 and
In the real world, the parameters of the
89.7% when k=10, respectively. When the data are
distribution are usually unknown and need to be
from an Exponential(5) distribution with sample size
estimated. When the parameters are estimated from
fifty, the power of the KS test is 82.0% and the
the data, it affects the power of the KS test.
power of the Chi-square tests is 72.7% when k=5 and
Moreover, the KS test does not perform well in the
98.3% when k=10. However, the KS test performs
sense of power; when the data are tested with the
better or closely to the Chi-square test (k=10) when n
same type of distribution with a similar mean with
is greater than or equal to 100. When the data are
less power than the Chi-square test. In additions, the
from a Weibull(10,2) distribution with sample size
KS test has superior power to the Chi-square test
fifty, the power of the KS test is 5.99% and the
when the sample size is large which consistent with
power of the Chi-square tests are 13.87% when k=5
Massey’s notes [16]. In all cases studied, the KS test
and 12.21% when k=10. That is, when the sample
always has a smaller type I error rate than the Chi-
size is small (n ≤ 50), the Chi-square test has greater square test.
power than the KS test. For sample sizes greater than For the Chi-square test, as the number of
200, the KS test has greater power than the Chi- intervals increased, the type I error rate and power
square test. Since the Weibull(10,2) distribution is decreased. In general, the computing time of the KS
very similar to a normal distribution, a larger sample test is longer than that of the Chi-square test. Besides,
60
Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square
and Kolmogorov-Smirnov Tests
the limit of this study is empirical and not [7] M. J. Slakter, “Large values for the number of
mathematical. groups with the Pearson Chi-squared goodness-of-
fit test,” Biometrika, 60, pp.420-421, 1973.
[8] W. J. Conover, Practical Nonparametric
Acknowledgement Statistics, Second Edition, John Wiley & Sons,
New York, 1980.
This research was supported by the National [9] M. J. Slakter, “A comparison of the Pearson Chi-
Science Council grant NSC 96-2118-M-275-001. square and Kolmogorov goodness-of-fit tests
with respect to validity,” Journal of the
American Statistical Association, 60, pp.854-858
(6.1), 1965.
References [10] L. A. Goodman, ” Kolmogorov-Smirnov tests
for psychological research,.” Psychological
Bulletin, 51, pp.160-168, 1954.
[1] J. L. Romeu, and C. Grethlein,. A Practical Guide
[11] F. J., Jr. Massey, “The Kolmogorov-Smirnov
to Statistical Analysis of Material Property Data,
test for goodness of fit,” Journal of the American
AMPTIAC, 2000.
Statistical Association, 46, pp.68-78, 1951.
[2] R. Walpole,R. Myers, S. Myers, and K. Ye,
[12] H. B. Mann, and A. Wald, “ On the choice of the
Probability and Statistics for Engineers and
number of intervals in the application of the chi-
Scientists, 8th Edition, Prentice Hall, NJ., 2007.
square test,” Annals of Mathematical Statistics,
[3] K. Pearson, “On the criterion that a given system
13, pp.306-317, 1942.
of deviations from the probable in the case of
[13] V. K. Rohatgi, An Introduction to Probability
correlated system of variables is such that it can
Theory and Mathematical Statistics, Wiley, NY,
reasonably be supposed to have arisen from
1976.
random sampling,” Philosophical Magazine (5),
[14] N. Mann, Schafer, and N. Singpurwalla,
50, 157-175 (4.5), 1900.
Methods for Statistical Analysis of Reliability
[4] A. N. Kolmogorov, “Sulla determinazione
and Life Data, John Wiley, NY, 1974.
empirca di una legge di distribuzione,” Giornale
[15] J. L. Romeu,. Kolmogorov-Smirnov: A
dell’ Istituto Italiano degli Attuari, 4, pp.83-91,
Goodness-of-Fit Test for Small Samples. RAC
1933.
START, volume 10, number 6, 2003.
[5] W. G. Cochran, “ The x2 test of goodness of fit,”
[16]. F. J., Jr. .Massey,, “ A note on the power of a
Annals of Mathematical Statistics, 23, pp.315-
non-parametric test,” Annals of Mathematical
345, 1952.
Statistics, 21, pp.440-443, 1950.
[6] J. K. Yarnold, “The minimum expectations in x2
goodness to fit tests and the accuracy of
approximations for the null distribution,”
Journal of the American Statistical Association,
65, pp.865-886, 1970.
61
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64
Table 2 The Type I error rate for both tests (x2 and KS) at α=0.05. Data are generated from the
null hypothesis (H0) with different sample sizes (n) and numbers of intervals (k) for
10,000 simulations.
H0
Normal(0,1) Exponential(5) Weibull(10,2)
x2 x2 x2
n (k=5) (k=10) KS (k=5) (k=10) KS (k=5) (k=10) KS
20 - - 0.006 - - 0.054 - - 0.005
30 0.079 - 0.007 0.138 - 0.057 0.077 - 0.006
50 0.070 0.052 0.006 0.128 0.075 0.054 0.078 0.052 0.005
100 0.073 0.054 0.007 0.126 0.080 0.050 0.077 0.051 0.005
200 0.079 0.062 0.006 0.123 0.083 0.052 0.075 0.055 0.006
Table3 The Power (%) for Both Tests (x2 and KS) under the Null Hypothesis (H0) at α=0.05.
Data are Generated from the Alternative Hypothesis (H1) with Different Sample Sizes (n)
and Numbers of Intervals (k) for 10,000 Simulations.
H0: Normal Distribution having the same mean
H0: Normal(0.5,1) and variance as in H1
H1: Normal (0,1) H1: Exponential(5) H1:Weibull(10,2)
x2 x2 x2
n (k=5) (k=10) KS (k=5) (k=10) KS (k=5) (k=10) KS
20 - - 88.25 - - 27.56 - - 2.09
30 88.87 - 98.60 56.78 - 50.25 10.91 - 3.13
50 99.78 89.70 100.00 72.69 98.32 82.01 13.87 12.21 5.99
100 100.00 100.00 100.00 93.98 100.00 99.72 19.23 20.07 13.38
200 100.00 100.00 100.00 99.87 100.00 100.00 30.32 37.48 34.24
300 100.00 100.00 100.00 100.00 100.00 100.00 41.97 53.56 56.29
400 100.00 100.00 100.00 100.00 100.00 100.00 54.36 67.57 73.33
500 100.00 100.00 100.00 100.00 100.00 100.00 66.00 79.27 85.69
62
Comparison of the Goodness-of-Fit Tests: the Pearson Chi-square
and Kolmogorov-Smirnov Tests
Fig. 1. Power for Both Chi-square (k=5,10) and KS Tests at α=0.05 and n=200. Data are Generated from
the Same Distribution as H0 but Without the Shifted value under 10,000 Simulations.
63
ISSN 1812-8572 (2009)計量管理期刊 vol. 6, no. 1, page 57~64
適合度檢定之比較研究-卡方檢定和 Kolmogorov-Smirnov
檢定法
王曉玫
嶺東科技大學 資訊管理系
台中市嶺東路一號
TEL:04-23892088 轉 9820
E-mail: hmwang@teamail.ltu.edu.tw
摘要
適合度檢定通常用於檢定一個未知分配的隨機樣本,檢查其是否服從虛無假設所設定的已知且特定
的分配函數。在適合度檢定中,卡方檢定可以被應用於任何能計算累積分配函數的單變量分配;當小
均會小於卡方檢定。在檢定力的比較中,除了虛無假設為已知平均數的分配 (不論分配的離散大小)
關鍵字:卡方檢定,適合度檢定,Kolmogorov-Smirnov 檢定。
64