P. 1
Abdi Lillie2007 Pretty

|Views: 0|Likes:

See more
See less

05/07/2012

pdf

text

original

# Lilliefors/Van Soest’s test of normality

Hervé Abdi1 & Paul Molin

1 Overview
The normality assumption is at the core of a majority of standard statistical procedures, and it is important to be able to test this assumption. In addition, showing that a sample does not come from a normally distributed population is sometimes of importance per se. Among the many procedures used to test this assumption, one of the most well-known is a modiﬁcation of the Kolomogorov-Smirnov test of goodness of ﬁt, generally referred to as the Lilliefors test for normality (or Lilliefors test, for short). This test was developed independently by Lilliefors (1967) and by Van Soest (1967). The null hypothesis for this test is that the error is normally distributed (i.e., there is no difference between the observed distribution of the error and a normal distribution). The alternative hypothesis is that the error is not normally distributed. Like most statistical tests, this test of normality deﬁnes a criterion and gives its sampling distribution. When the probability associated with the criterion is smaller than a given α-level, the
In: Neil Salkind (Ed.) (2007). Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. Address correspondence to: Hervé Abdi Program in Cognition and Neurosciences, MS: Gr.4.1, The University of Texas at Dallas, Richardson, TX 75083–0688, USA E-mail: herve@utdallas.edu http://www.utd.edu/∼herve
1

1

reﬁned the approximation given by Dagnelie and computed new tables using a larger number of runs (i. the procedure consists of extracting a large number of samples from a Normal Population and computing the value of the criterion for each of these samples. K = 100. the relative error being of the order of 10−2 . 1000 random samples derived from a standardized normal distribution to approximate the sampling distribution of a Kolmogorov-Smirnov criterion of goodness of ﬁt. However in this case. According to Lilliefors (1967) this test of normality is more powerful than others procedures for a wide range of nonnormal conditions. each of them denoted X i .H. Essentially. mathematical statisticians derive the sampling distribution of the criterion using analytical techniques. 2 Notation The sample for the test is made of N scores. in addition.e. Recently. Molin and Abdi (1998). In general. The sample mean is denoted M X is computed as MX = 1 N N Xi . Speciﬁcally.. Molin: Lilliefors / Van Soest Normality Test alternative hypothesis is accepted (i. that the critical values reported by Lilliefors can be approximated by an analytical formula. Dagnelie (1968) indicated. The critical values given by Lilliefors and Van Soest are quite similar. both Lilliefors and Van Soest used. The empirical distribution of the values of the criterion gives an approximation of the sampling distribution of the criterion under the null hypothesis. this approach fails and consequently. for each sample size chosen. 000) in their simulations. i (1) 2 . we conclude that the sample does not come from a normal distribution). Such a formula facilitates writing computer routines because it eliminates the risk of creating errors when keying in the values of the table..e. Lilliefors decided to calculate an approximation of the sampling distribution by using the Monte-Carlo technique. An interesting peculiarity of the Lilliefors’ test is the technique used to derive the sampling distribution of the criterion. Abdi & P.

|S (Zi ) − N (Zi −1 )|} . The ﬁrst step of the test is to transform each of the X i scores into Z -scores as follows: Zi = Xi − MX . The term |S (Zi )−N (Zi −1 )| is needed to take into account that. and the frequencies actually observed. 3 . because the empirical distribution is discrete. We denote this probability by N (Zi ). The Null hypothesis is rejected when the L criterion is greater than or equal to the critical value Lcritical . and it is equal to Zi 1 1 exp − Zi2 . and it is equal to L = max {|S (Zi ) − N (Zi )|. For each Zi -score we also compute the probability associated with this score if is comes from a “standard” normal distribution with a mean of 0 and a standard deviation of 1. Abdi & P. It is calculated from the Z-scores.H. The critical values are given by Table 2. SX (3) For each Zi -score we compute the proportion of score smaller or equal to its value: This is called the frequency associated with this score and it is denoted S (Zi ). Lcritical is the critical value. (4) N (Zi ) = 2 −∞ 2π The criterion for the Lilliefors’ test is denoted L. (2) and the standard deviation of the sample denoted S X is equal to the square root of the sample variance. the maximum absolute difference can occur at either endpoints of the empirical distribution. Molin: Lilliefors / Van Soest Normality Test the sample variance is denoted N S2 = X i (X i − M X )2 N −1 . i (5) So L is the absolute value of the biggest split between the probability associated to Zi when Zi is normally distributed.

The Normality assumption states that the error is normally distributed. Molin: Lilliefors / Van Soest Normality Test 3 Numerical example As an illustration.) and correspond to memory scores obtained by 20 subjects who were assigned to one of 4 experimental groups (hence 5 subjects per group).2 3 0 4 .e. the error corresponds to the residuals which are equal to the deviations of the scores to the mean of their group. it correspond to the best estimation of the population error variance.2 Y a.2 4 .8 4 1 4 . So in order to test the normality assumption for the analysis of variance.2 4 −3 5 .2 G.2 5 −2 5 .2 8 1 3 −1.8 9 2 4 −. G.8 2 −1 5 1.. The score of the sth subject in the ath group is denoted Y a. 93ff. In the analysis of variance framework. The data are from Abdi (1987. we will look at an analysis of variance example for which we want to test the so-called “normality assumption” that states that the within group deviations (i. 1 3 3 2 4 3 15 3 G. M a. the “residuals”) are normally distributed.35. 2 5 9 8 4 9 35 7 G. 4 5 4 3 5 4 21 4. We denote X i the residual corresponding to the i th observation (with i going from 1 to 20). . p.H. Abdi & P. 3 2 4 5 4 1 16 3.s . The residuals are given in the following table: Y as Xi Y as Xi 3 0 2 −1.8 9 2 4 −. the ﬁrst step is to compute the residuals from the scores. The within-group mean square MSS(A) is equal to 2.8 3 0 1 −2. and the mean of each group is denoted M a.

we ﬁnd (from Table 2) that the critical value is equal Lcritical = . with N = 20. Because L is larger than Lcritical . These values correspond to the alpha values used for most tests involving only one null hypothesis. Abdi & P. or αPC ) ˘ such as the Bonferonni or Sidák corrections. for each Zi value. the null hypothesis is rejected and we conclude that the residuals in our experiment are not distributed normally. For example. For example. the Familywise Type I error or αP F ). and the mean of X i is zero. the frequency associated with S (Zi ) and the probability associated with Zi under the Normality condition N (Zi ) are computed [we use a table of the Normal Distribution to obtain N (Zi )].05. however. as this was the standard procedure in the late sixties. The value of the criterion is (see Table 1) L = max{|S (Zi ) − N (Zi )|. favors multiple tests (maybe as a consequence of the availability of statistical packages). The results are presented in Table 1. . using a Bonferonni approach with a familywise value of αP F = .e. .10. The current statistical practice.H. .. i (7) Taking an α level of α = . Then. it has become customary to recommend testing each hypothesis with a corrected α level (i.15. the Type I error per comparison.01].05.192.05. the present table reports the critical values for α = [..250 .e. and 5 . Molin: Lilliefors / Van Soest Normality Test Next we transform the X i values into Zi values using the following formula: Xi (6) Zi = MSS(A) because MSS(A) is the best estimate of the population variance. . |S (Zi ) − N (Zi −1 )|} = . 4 Numerical approximation The available tables for the Lilliefors’ test of normality typically report the critical values for a small set of alpha values. Because using multiple tests increases the overall Type I error (i.20.

097 .17 1.30 .051 .0 .143 .13 .30 −.10 . N (Zi ) is the probability associated with Zi for the standard normal distribution.218 .050 . (8) J 3 ˘ With a Sidák approach.049 . Ni stands for the absolute frequency of a given value of X i . it is rather unlikely that a table could be precise enough to provide the wide range of alpha values needed for multiple testing purposes.65 −.699 .074 . S (Zi ) is the proportion of scores smaller than Zi .102 . The value of the criterion is L = . Zi is the Z -score corresponding to X i . Xi −3.0 −2.052 .075 .15 . D 0 =| S (Zi ) − N (Zi ) |.120 testing J = 3 hypotheses requires that each hypothesis is tested at the level of αPC = 1 αP F = 1 × .96 −1.151 .250 .050 .097 D −1 .102 .2 . and max is the maximum of {D 0 .250.30 S (Zi ) .0167 .44 −1. Abdi & P.00 .143 .00 N (Zi ) .90 1.05) 3 = .032 .075 .85 .449 . the number of scores smaller or equal to X i ).075 .0170 .154 . As this example illustrates.75 .120 max . each hypothesis will be tested at the level of 1 1 (9) αPC = 1 − (1 − αP F ) J = 1 − (1 − .0 −.025 .40 .0 −1.500 .65 1. Fi stands for the absolute frequency associated with a given value of X i (i. D −1 }.050 .083 . Molin: Lilliefors / Van Soest Normality Test Table 1: How to compute the criterion for the Lilliefors’ test of normality.108 .903 D0 .154 .8 1.157 .025 .021 .H. A 6 . In fact.250 .053 .025 .074 .083 .e.0 1. D −1 =| S (Zi ) − N (Zi −1 ) |.258 .8 2.05 = .2 −2. both procedures are likely to require using different α levels than the ones given by the tables..2 −1.0 Ni 1 1 1 2 1 2 3 4 2 1 2 Fi 1 2 3 5 6 8 11 15 17 18 20 Zi −1.879 .78 −.52 .742 .157 .55 .05 .151 .25 .

00069650013110A 8 − 0. Another approach to ﬁnding critical values for unusual values of α. b 1 = 1.) 7 .30748185078790 b 2 = 0.40466213484419A 5 − 0. The ﬁrst step is to compute a quantity called A obtained from the following formula: −(b 1 + N ) + (b 1 + N )2 − 4b 2 b 0 − L−2 2b 2 A= with . suppose that we have obtained a value of L = . (Table 2 shows that Pr(L) = . or.39874347510845A 4 + 0.00287462087623A 7 + 0.1030 from a sample of size N = 50. alternatively.00000575586834A 10 . somewhat complex.08861783849346 (11) The second step implements a polynomial approximation and estimates the probability associated to a given value L as: Pr(L) ≈ −.67819837908004A − 3. Their procedure.37782822932809 + 1. to obtain the probability associated to any value of the Kolmogorov-Smirnov criterion. Such an approach can be implemented by approximating the sampling distribution “on the ﬂy" for each speciﬁc problem and deriving the critical values for unusual values of α.20. Molin and Abdi (1998) proposed such an approximation and showed that it was accurate for at least the ﬁrst two signiﬁcant digits.80015798142101A 3 − 1. + 0.06353440854207A 6 + 0.00011872227037A 9 (12) For example.02959249450445A 2 + 2.H. Abdi & P. is better implemented with a computer and comprises two steps. Molin: Lilliefors / Van Soest Normality Test more practical solution is to generate the critical values for any alpha value. is to ﬁnd a numerical approximation for the sampling distributions. (10) b 0 = 0.37872256037043 .

References [1] Abdi. the approximated value of Pr(L) is correct for the ﬁrst two decimal values. Molin: Lilliefors / Van Soest Normality Test To estimate Pr(L) we need ﬁrst to compute A. Grenoble: Presses Universitaires de Grenoble. we compute the estimate of A as: A= −(b 1 + N ) + (b 1 + N )2 − 4b 2 b 0 − L−2 2b 2 −(b 1 + 50) + (b 1 + 50)2 − 4b 2 b 0 − .utd.. Abdi H. Journal of the American Statistical Association. Technical report. W. 1967. 3–13. (1998). P. Biométrie et Praximétrie 9. 8 . 399–402. 21. 91–97. 62. Plugging in this value of A in Equation 12 gives Pr(L) = . University of Bourgogne. [2] Dagnelie. Abdi & P. [3] Lilliefors.1030−2 2b 2 (13) = = 1. [5] Van Soest. Neerlandica.19840103775379 ≈ .H. (1967). Some experimental results concerning tests of normality. New Tables and numerical approximation for the KolmogorovSmirnov/Lillierfors/Van Soest test of normality. From Equation 10. Introduction au traitement statistique des données expérimentales.edu/∼herve/MA_Lilliefors98.20 . (1987). J. [4] Molin. Statistica. (1968). A propos de l’emploi du test de Kolmogorov-Smirnov comme test de normalité. and then use this value in Equation 12. (14) As illustrated by this example. H.82402308769590 . P. Available from www. H. On the Kolmogorov-Smirnov test for normality with mean and variance unknown.pdf. (1967).

1758 .2090 .3331 .4129 .2502 .2141 .2649 .2616 .2714 .1423 α = .2071 .1498 .2875 .1690 .2337 .83 + N N − .3188 .1764 .1852 .1448 .1881 .1629 .3959 .20 .1589 .1985 .2905 .2257 .1965 . For N > 50 the critical .3456 .2053 .2477 .1711 .1700 .1920 .2522 .1794 .1726 .2306 .3216 .2171 .2816 .2101 .2812 . The intersection of a given row and column shows the critical value Lcritical for the sample size labelling the row and the alpha level labelling the column.2004 .2506 . .1553 .1843 .1726 .2408 .3027 .3041 .2641 .2190 .1562 .1803 .1811 .2018 .2627 .01.1932 .10 . .2382 .1956 .3754 .2147 .2273 .2694 .Table 2: Table of the critical values for the Kolmogorov-Smirnov/Lillefors test of normality obtained with K = 100.1766 .2228 .15 .2982 .3162 .3504 .3728 . .1840 .1665 .1527 .1959 .3245 .1619 .1406 .1624 .1941 .05 .2273 .3027 .2285 . N α = .2893 .2426 .1429 .2410 .2387 .1472 .1533 .1592 .2345 .2744 .1641 α = .1869 .1358 α = .2802 .1589 .2077 .2080 .1798 .1699 .2016 .1509 α = .1484 .2545 . 000 samples value can be found by using f N = for each sample size.1899 .2226 .2128 .1650 .1911 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Table continues on the following page .1666 .3037 .1458 .1666 .1381 .2179 .1747 .1555 .2521 .2010 .3427 .1517 .2025 .1902 .2196 .01 .

1848 .1677 . N α = .775 fN α = .1083 .1421 .1254 .1353 .1463 .1392 .1436 .1148 .1180 .1106 .10 . .1402 .1095 .1599 . The intersection of a given row and column shows the critical value Lcritical for the sample size labelling the row and the alpha level labelling the column.1098 .1573 .1886 .1089 .H.1315 .1214 .1228 . Molin: Lilliefors / Van Soest Normality Test Table 3: .1320 .1542 . 000 samples for each sample size.15 .1378 .1483 .1798 .1071 .1334 .1204 .1147 .1030 0.1142 0.1204 . .1559 .1230 .1159 .1216 .1336 .1542 .1291 .1518 .1720 .1119 .1634 .1614 .1288 .1478 .819 fN α = .035 fN 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 > 50 10 . Table of the critical values for the Kolmogorov-Smirnov/Lillefors test of normality obtained with K = 100.1174 .1322 .1282 .1220 .1695 .1415 .1398 . Abdi & P.1258 . Continued.1256 .1159 .1339 .1432 .1260 .1244 .1203 .1040 .1131 .83 + N N − .1336 .1460 .1373 .05 .1356 .1303 .1309 .1499 .1165 .01. For N > 50 the critical value can be found by using f N = .1189 .1236 .20 .1770 .1274 .1525 .741 fN α = .1186 .1293 .1457 1.1590 .1113 .1653 .1497 .1269 .1134 .1172 .1476 .895 fN α = .1047 .1747 .1512 .1245 .1295 .1353 .1278 .1188 .1153 .1373 .1820 .1275 .1386 .1246 0.1123 .1079 0.1556 .1062 .01 .1454 .1314 .1616 .

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->