Professional Documents
Culture Documents
To cite this article: Hadi Alizadeh Noughabi & Naser Reza Arghami (2011) Monte Carlo comparison
of seven normality tests, Journal of Statistical Computation and Simulation, 81:8, 965-972, DOI:
10.1080/00949650903580047
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Journal of Statistical Computation and Simulation
Vol. 81, No. 8, August 2011, 965–972
This article studies seven different tests of normality. The tests in question are Kolmogorov–Smirnov,
Anderson–Darling, Kuiper, Jarque–Bera, Cramer von Mises, Shapiro–Wilk, and Vasicek. Each test is
described and power comparisons are made by using Monte Carlo computations under various alternatives.
The results are discussed and interpreted separately.
1. Introduction
To make a statistical inference, several assumptions about the data must be fulfilled. Most statis-
tical methods assume an underlying distribution in the derivation of their results. However, when
we assume that our data follow a specific distribution, we take a serious risk. If our assumption is
wrong, then the results obtained may be invalid. For example, the confidence levels of the confi-
dence intervals or error probabilities of the tests of hypotheses implemented may be completely
off. The consequences of misspecifying the distribution may prove very costly. One way to deal
with this problem is to check the distribution assumptions carefully.
The goodness-of-fit tests have been discussed by many authors including D’Agostino and
Stephens [1], Huber-Carol et al. [2], Li and Papadopoulos [3], Thode [4], Zhang and Cheng [5],
Steele and Chaseling [6], Jager and Wellner [7], Raschke [8], Zhao et al. [9], etc.
Normality assumption is indispensable in many statistical procedures, some of which may be
quite sensitive to any departure from normality. Therefore, testing normality is one of the most
studied goodness-of-fit problems.
Many normality tests have been developed by different authors. Since the invention of the
chi-squared goodness-of-fit test for normality by Pearson in 1900, considerable attention has
been given to the problem of testing normality and a fair number of tests can be found in
the literature. Cramer von Mises and Kolmogorov–Smirnov proposed their tests in 1931 and
1933, respectively. After almost two decades, Anderson and Darling suggested their test. After-
wards, Kuiper’s and Shapiro and Wilk’s tests of normality were introduced. A new normality test
‘Vasicek’s test of normality’ was suggested in 1976. Jarque and Bera proposed their test in 1987.
Detailed discussions on these tests may be found in D’Agostino and Stephens [1], Mardia [10],
and references therein.
Comparison of the normality tests has received attention in the literature. Stephens [11], by
Monte Carlo simulation, presents comparisons of some normality tests (Kolmogorov–Smirnov,
Cramer von Mises, Kuiper, Watson, Anderson–Darling, and Shapiro-Wilk). Vasicek’s [12] test
of normality based on entropy is not, of course, included in Stephens’s study, but Vasicek [12]
compared his test with the other tests under some alternatives [exponential, gamma(2), uniform,
beta(2,1), and Cauchy] and showed that for some alternatives his test is most powerful. Bera and Ng
[13] present a graphical alternative to the Q − Q plot for detecting departures from normality using
the score function. They concluded that the estimated score function is informative in performing
exploratory data analysis. Dufour et al. [14] compared different normality tests (Kolmogorov–
Smirnov, Cramer von Mises, Jarque–Bera, Anderson–Darling, Shapiro-Wilk, and D’Agostino)
Downloaded by [University of South Florida] at 07:12 25 March 2015
for residuals of linear regression model. Their study did not include Vasicek’s [12] test.
Esteban et al. [15] proposed three new tests of normality based on the three improved or modified
versions of the Vasicek entropy estimator. They computed critical values of the corresponding test
statistics for sample size 5 ≤ n ≤ 50 by using Monte Carlo experiments. They concluded that the
power of tests depends on alternatives; therefore, they divided the alternatives into four groups,
depending on the support and shape of their densities.
Goria et al. [16] and Choi [17] improved the normality tests based on entropy and compared the
proposed tests with other entropy-based tests, under some alternatives. Farrell and Rogers-Stewart
[18] compared some normality and symmetry tests. Also, Yazici and Yolacan [19] compared the
power of the normality tests for the populations from distributions (beta, gamma, log-normal,
Weibull, and t) and with different sample sizes (n = 20, 30, 40 and 50). Meintanis [20] compared
classical normality tests with the tests based on characteristic function. Meintanis did not include
the tests of normality based on entropy in their comparisons.
In all of the above studies (with the exception of Esteban et al. [15], who compared only four
entropy-based normality tests, using different entropy estimates, under classified alternatives),
alternatives were not classified and authors considered only some alternatives. In this paper, we
consider seven normality tests and compare them with each other, by Monte Carlo simulations,
under classified alternatives. Our choice of the seven tests has been based on popularity (like
Kolmogorov–Smirnov, Anderson–Darling) and powerfulness (Shapiro–Wilk, Vasicek).
It turns out (Section 3) that no single test procedure is uniformly more powerful than others.
That is to say that some tests are more powerful than other tests for some alternatives and some
are better for other alternatives. We have classified normality tests based on the alternatives under
which some tests are uniformly most powerful.
In this paper, the methodologies of the tests mentioned earlier are given in Section 2. All the
tests are compared with each other by Monte Carlo simulation in Section 3. The last section
includes some conclusions.
2. Tests of normality
2
[n/2]
i=1 a(n−i+1) X(n−i+1) − X(i)
Shapiro–Wilk W = n
2 The coefficients ai are
i=1 X(i) − X̄ tabulated in Pearson and
Hartley [27]
n
(2i − 1){ln(Zi ) + ln(1 − Zn−i+1 )} X(i) − X̄
Anderson–Darling A2 = −n − i=1 Zi = : where
n SX
is the cdf of standard
normal distribution.
exp(H (m, n))
KLnm =
SX
2 : sample variance; m:
Vasicek
1
n n SX
H (m, n) = log (X(i+m) − X(i−m) ) n
n 2m positive integer m ≤
i=1 2
c2 (k − 3)2 c = skewness
Jarque–Bera JB = n +
6 24 k = kurtosis
Among the tests in Table 1, Vasicek’s test and Shapiro–Wilk and Jarque–Bera tests are specific
in the sense that the null hypothesis is normal, while the rest are suitable for any null family of
distributions. Also, although Vasicek’s test and Shapiro–Wilk and Jarque–Bera tests are exact, the
other four tests are approximate in the sense that the actual size of the test is only approximately
equal to the nominal size. For further study about these tests, see the references.
3. Simulation study
In this section, power comparisons of tests of normality are made by using Monte Carlo com-
putations. Table 2 gives Type I error probabilities (the actual size of the tests), which (with the
exception of Vasicek’s test and Shapiro–Wilk and Jarque–Bera tests) have been obtained by 20,000
simulations.
We compute the powers of the tests based on CH, D, V , W , A2 , KLmn , and JB statistics by
means of Monte Carlo simulations under 20 alternatives. These alternatives were used by Esteban
et al. [15] in their study of power comparisons of several tests for normality. The alternatives can
be divided into four groups, depending on the support and shape of their densities. From the point
of view of applied statistics, natural alternatives to normal distribution are in Groups I and II. For
the sake of completeness, we also consider Groups III and IV. This fact gives additional insight
towards understanding the behaviour of the tests.
968 H. Alizadeh Noughabi and N.R. Arghami
Table 2. The actual Type I error probabilities of tests of normality (n = 20, nominal α = 0.05, σ = standard deviation).
Table 3. Power comparisons of 0.05 tests based on CH, D, V , W , A2 , KLmn and JB statistics for sample sizes
n = 10,20,30 and 50 under alternatives from Group I.
n Alternatives CH D V W A2 KLmn JB
Table 4. Power comparisons of 0.05 tests based on CH, D, V , W , A2 , KLmn and JB statistics for sample sizes
n = 10, 20, 30 and 50 under alternatives from Group II.
n Alternatives CH D V W A2 KLmn JB
Table 5. Power comparisons of 0.05 tests based on CH, D, V , W , A2 , KLmn and JB statistics for sample sizes
n = 10, 20, 30 and 50 under alternatives from Group III.
n Alternatives CH D V W A2 KLmn JB
• Uniform;
• Beta(2,2);
• Beta(0.5,0.5);
• Beta(3,1.5);
• Beta(2,1).
Downloaded by [University of South Florida] at 07:12 25 March 2015
Table 6. Power comparisons of 0.05 tests based on CH, D, V , W , A2 , KLmn and JB statistics for sample sizes
n = 10,20,30 and 50 under alternatives from Group IV.
n Alternatives CH D V W A2 KLmn JB
Groups (alternatives)
I II III IV
JB W W, KLmn KLmn
In Group II, the test W has the most power and the test V has the least power. For n = 10, the
test JB has the most power and the difference of powers between W and JB is small.
In Group III, the tests W and KLmn have the most power and the test D has the least power.
The difference of powers between the test W (KLmn ) and the other tests are substantial.
In Group IV, the test KLmn has the most power and the test JB has the least power. The difference
Downloaded by [University of South Florida] at 07:12 25 March 2015
of powers between the test KLmn and the other tests are substantial.
4. Conclusions
In this paper, we first described seven tests for normality, namely Kolmogorov–Smirnov,
Anderson–Darling, Kuiper, Jarque–Bera, Cramer von Mises, Shapiro–Wilk, and Vasicek.
The paper also compares the power of these seven tests using Monte Carlo computations for
sample sizes n = 10, 20, 30 and n = 50. Differences in power of the seven tests are consid-
erable and each of the tests JB, A2 , W , and KLmn can be most powerful in the group of tests
{CH, D, V , W, A2 , KLmn , JB}, depending on the type of alternatives. The test KLmn is most
powerful against alternatives with the support (0, 1) (Group IV). The tests JB and A2 are most
powerful against symmetric alternatives with the support (−∞, ∞) (Group I).
The tests W and KLmn are most powerful against alternatives in Group III with the support
(0, ∞). The test W is most powerful against asymmetric alternatives in Group II with the support
(−∞, ∞).
Based on these observations, we can formulate the following recommendations for the
application of the studied tests in practice:
• Use the statistic JB, if the assumed alternatives are symmetric and supported by (−∞, ∞).
• Use the statistic KLmn , based on sample entropy, if the assumed alternatives are supported by
the bounded interval (0, 1).
• Use the statistic KLmn or W , if the assumed alternatives are supported by (0, ∞).
• Use the statistic W , if the assumed alternatives are asymmetric and supported by (−∞, ∞).
Acknowledgements
The authors express their appreciation to an anonymous referee and the Associate Editor whose comments improved this
manuscript. Partial support from Ordered and Spatial Data Center of Excellence of Ferdowsi University of Mashhad is
acknowledged.
References
[1] R.B. D’Agostino and M.A. Stephens, Goodness-of-Fit Techniques, Marcel Dekker, Inc, New York, 1986.
[2] C. Huber-Carol, N. Balakrishnan, M.S. Nikulin, and M. Mesbah, Goodness-of-Fit Tests and Model Validity,
Birkhäuser, Boston, Basel, Berlin, 2002.
[3] G. Li and A. Papadopoulos, A note on goodness of fit test using moments, Statistica 62(1) (2002), pp. 72–86.
[4] H. Thode Jr., Testing for Normality, Marcel Dekker, New York, 2002.
[5] C. Zhang and B. Cheng, Binning methodology for nonparametric goodness-of-fit test, J. Stat. Comput. Simul. 73
(2003), pp. 71–82.
972 H. Alizadeh Noughabi and N.R. Arghami
[6] M. Steele and J. Chaseling, Powers of discrete goodness-of-fit test statistics for a uniform null against a selection of
alternative distributions, Commun. Stat. Simul. Comput. 35 (2006), pp. 1067–1075.
[7] L. Jager and J.A. Wellner, Goodness-of-fit tests via phi-divergences, Ann. Stat. 35(5) (2007), pp. 2018–2053.
[8] M.F. Raschke, The biased transformation and its application in goodness-of-fit tests for the beta and gamma
distribution, Commun. Stat. Simul. Comput. 38 (2009), pp. 1870–1890.
[9] J. Zhao, X. Xu, and X. Ding, Some new goodness-of-fit tests based on stochastic sample quantiles, Commun. Stat.
Simul. Comput. 38 (2009), pp. 571–589.
[10] K.V. Mardia, Tests of univariate and multivariate normality, in Handbook of Statistics 4, P.R. Krishnaiah, ed.,
Amsterdam, North-Holland, 1980.
[11] M.A. Stephens, EDF statistics for goodness of fit and some comparisons, J. Am. Stat. Assoc. 69 (1974), pp. 730–737.
[12] O. Vasicek, A test for normality based on sample entropy, J. R. Stat. Soc. B 38 (1976), pp. 54–59.
[13] A.K. Bera and P.T. Ng, Tests for normality using estimated score function, J. Stat. Comput. Simul. 52(3) (1995),
pp. 273–287.
[14] J.M. Dufour,A. Farhat, L. Gardiol, and L. Khalaf, Simulation-based finite sample normality tests in linear regressions,
Econ. J. 1 (1998), pp. 154–173.
[15] M.D. Esteban, M.E. Castellanos, D. Morales, and I. Vajda, Monte Carlo comparison of four normality tests using
Downloaded by [University of South Florida] at 07:12 25 March 2015
different entropy estimates, Commun. Stat. Simul. Comput. 30 (2001), pp. 761–785.
[16] M.N. Goria, N.N. Leonenko, V.V. Mergel, and P.L. Novi Inverardi, A new class of random vector entropy estimators
and its applications in testing statistical hypotheses, Nonparametric Stat. 17(3) (2005), pp. 277–297.
[17] B. Choi, Improvement of goodness of fit test for normal distribution based on entropy and power comparison, J. Stat.
Comput. Simul. 78(9) (2008), pp. 781–788.
[18] P.J. Farrell and K. Rogers-Stewart, Comprehensive study of tests for normality and symmetry: Extending the
Spiegelhalter test, J. Stat. Comput. Simul. 76(9) (2006), pp. 803–816.
[19] B. Yazici and S. Yolacan, A comparison of various tests of normality, J. Stat. Comput. Simul. 77(2) (2007),
pp. 175–183.
[20] S.G. Meintanis, Goodness-of-fit testing by transforming to normality: Comparison between classical and charac-
teristic function-based methods, J. Stat. Comput. Simul. 79(2) (2009), pp. 205–212.
[21] R. von Mises, Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik, Deuticke,
Leipzig and Vienna, 1931.
[22] A.N. Kolmogorov, Sulla determinazione empirica di une legge di distribuzione, Giornale dell’Intituto Italiano degli
Attuari 4 (1933), pp. 83–91.
[23] T.W. Anderson and D.A. Darling, A test of goodness of fit, J. Am. Stat. Assoc. 49 (1954), pp. 765–769.
[24] N.H. Kuiper, Test concerning random points on a circle. Proc. K. Ned. Akad. Wet. A. 63 (1962), pp. 38–47.
[25] S.S. Shapiro and M.B. Wilk, An analysis of variance test for normality, Biometrika 52 (1965), pp. 591–611.
[26] C.M. Jarque and A.K. Bera, A test normality of observations and regression residuals, Int. Stat. Rev. 55 (1987),
pp. 163–172.
[27] E.S. Pearson and H.O. Hartley, Biometrika Tables for Statisticians, Cambridge University Press, London, 1972.