You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352167293

ASSUMPTION AND TESTING OF NORMALITY FOR STATISTICAL ANALYSIS

Article · June 2021

CITATIONS READS
5 1,044

2 authors:

Ajay S. Singh Micah Masuku


University of Eswatini University of Swaziland
38 PUBLICATIONS   743 CITATIONS    54 PUBLICATIONS   1,453 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Study on fife style behavior of post COVID patients: A case study on moderate and severe cases of COVID-19 View project

Crop production View project

All content following this page was uploaded by Ajay S. Singh on 02 August 2021.

The user has requested enhancement of the downloaded file.


AMERICAN JOURNAL OF MATHEMATICS AND MATHEMATICAL SCIENCES
3(1) • January-June 2014 • ISSN : 2278-0874 • pp. 169-175

ASSUMPTION AND TESTING OF NORMALITY FOR


STATISTICAL ANALYSIS
Ajay S. Singh and Micah B. Masuku
Department of AEM, Faculty of Agriculture, University of Swaziland, Luyengo Campus
Luyengo M205, Swaziland
E-mail: singhas64@hotmail.commbmasuku@uniswa.sz

ABSTRACT: Applications of statistical techniques arevery common in scientific as well as multidisciplinary research.
No doubt statistical tools are useful to describe the numerical facts as well as relationship between the factors and to
test the independence of attributes or variables. Some of the researchers used statistics and explain the findings
effectively. In other hand most of the researchers are using statistical tools without understanding of statistical
technicality and thoughts. Concept of normality is the most important part of statistical techniques. The assumption
of normality is required for most of the statistical tools, namely correlation, regression, parametric test because their
validity was based on normality. The main purpose of this article is to highlight the basic and important assumption
based on normal distribution in terms of normality test. In this paper describe the concept of normality and how to
test the normality of observed data for further statistical analysis in simple manners.
Keywords: Statistical Analysis, Normal Distribution, Testing for Normality

INTRODUCTION
Applications of statistical methods are commonly used in multidisciplinary research. Some of the researchers
are using statistical tools without understanding and their pre requisites. Many of the statistical methods
including correlation, regression, analysis of variance, parametric tests are based on normal distribution. In
other words testing of normality required for most of the statistical procedure. In the context of correlation,
regression and parametric test based on the normal distribution under the assumption that population from
which the samples are taken normally distributed [1-3]. Statistical tools under the important assumption
about the normality should be taken seriously, otherwise difficult to draw the accurate and reliable conclusion
about the reality[4-5].

BASICS OF NORMAL DISTRIBUTION


In the theory of probability, the normaldistributionis a very commonly using continuous probability
distributionfunction. Normal distributions are also important in statistics and social sciences for real-valued
random variables whose distributions are not known [6][7].
The normal distribution is useful because of the central limit theorem, which states that, under mild
conditions, the mean of many random variables independently drawn from the same distribution is distributed
approximately normal, irrespective of the form of the original distribution. Physical quantities are expected
to be the sum of many independent processes has a distribution very close to the normal. Moreover, many
results and methods can be derived analytically in explicit form when the relevant variables are normally
distributed.
A normal distribution is
f (x, µ, �) = 1 /���2� Exp[-(x-µ)2/2�2]
170 AJAY S. S INGH AND MICAH B. MASUKU

The parameter µ in this definition is themeanand also its median and mode. The parameter is its
standard deviation (variance = 2). A random variable with a normal distribution is said to be normally
distributed and is known as anormal deviate[8].
If µ = 0 and = 1, the distribution is called the standard normal distributionand a random variable with
that distribution is a standard normal deviate.
The assumption of normality is just the supposition that the underlying random variable of interest is
distributed normally. Intuitively, normality may be understood as the result of the sum of a large number of
independent random events.

What is Assumed to be Normal?[9]


Applications of the parametric method to inferential statistics, the values that are assumed to be normally
distributed are the means across samples. Assumption of normality is underlies parametric stats does not
assert that the observations within a given sample are normally distributed, nor does it assert that the
values within the population are normal. This core element of the assumption of normality asserts that the
distribution of sample means is normal. In technical points, this assumption of normality asserts that the
sampling distribution of the mean is normal.

NORMALITY TESTS
Normality tests assess the likelihood that the given data set {x1, …,xn} comes from a normal distribution.
Null Hypothesis (H0): The sample data are not significantly different than a normal population.
Alternative Hypothesis (H1): The sample data are significantly different than a normal population.
In other words, the null hypothesis H0 is that the observations are distributed normally with unspecified
meanì and variance �2, versus the alternative H1 that the distribution is arbitrary.
There are several methods for assessing the normality of observed data. It means that to check the
observed data are normally distributed or not. These methods are classified in two broad categories. First
category is associated with graphical test and second associated with statistical techniques.

1. Graphical Test
Generally in the graphical test normality is to compare a histogram of the sample data to a normal probability
curve. The empirical distribution of the data (means histogram) should be bell-shaped and resemble the
normal distribution. It is difficult to see if the sample is small. In this case one proceeded by regressing the
data against the quantiles of a normal distribution with the same mean and variance as the sample. Lack of
fit to the regression line suggests a departure from normality.

1.1. Quantile-Quantile (Q-Q) Plot Test[10-11]


Q-Q plot is a plot of the sorted values from the data set against the expected values of the corresponding
quantiles from the standard normal distribution.

HOW TO USE
In this test, correlation between the sample data and normal quantiles (to measure the goodness of fit)
measures how well the data is modeled by a normal distribution. For normal data the points plotted in the
ASSUMPTION AND T ESTING OF N ORMALITY FOR STATISTICAL ANALYSIS 171

Q-Q plot should fall approximately on a straight line, indicating high positive correlation. These plots are
easy to interpret and also have the benefit that outliers are easily identified.

1.2. Cumulative Frequency (P-P) plot test[10-11]


P-P plot is similar to the Q-Q plot, but used much less frequently. This method consists of plotting the
points

2. Statistical Test
Statistical test are classified in different manners

2.1. W/S Normality Test [1] [12]


This test is based on t distribution and on q statistic. This test requires only standard deviation and the
range of the data.
q = w/s
Where q is statistic, s (�) is the standard deviation and w is the range of data. W/S test uses a critical
range. If the calculated value falls within the range, then accept the null hypothesis. If the calculated value
is outside the range then reject the null hypothesis.

APPLICATION (HOW TO USE W/S TEST?)


Application of the test is based on purely hypothetical data (name and value of Hb%) given in Table-1.

Table 1
ID. Name of the patients Hb%(xi )
01 Sanjay 4.2
02 Vijay 4.7
03 Rajesh 4.5
04 Masuku 4.8
05 Dlamini 4.8
06 Simelane 5.0
07 Singh 5.2
08 Sharma 5.0
09 Gama 4.9
10 Vaidya 5.1
11 John 5.3
12 Rahman 5.4
13 Simon 5.9
14 Khan 6.1
15 Taylor 6.2
16 Miller 6.3
17 Ahmed 7.6
Total (� =) 91 n = 17
172 AJAY S. S INGH AND MICAH B. MASUKU

Standard Deviation ( ) = 0.8352


Range (w) of the data = 3.4
Null Hypothesis (H0): The sample data are not significantly different than a normal population.
Alternative Hypothesis (H1): The sample data are significantly different than a normal population.
Q = w/s = 3.4/0.8352 = 4.0709
In this case critical range at 5% level of significance and n=17 lower limit(a) is 3.06 and upper limit (b)
is 4.31. Calculated value of q is within the range, therefore null hypothesis is accepted.

2.2. Jarque–Bera test [1] [13]


This test is based on chi square means goodness of fit. JarqueBera test is based on skewness (Sk) and
kurtosis (Ku). The value of Jarque-Bera test (JB) is compared to the critical value of Chi_Square (�2) with
2 degree of freedom.

Sk = {� (xi – X )3} / n * s3; i = 1, 2,….., n

Ku = {� (xi – X )4} / n * s4; i = 1, 2,….., n
Where x is the each observation is the standard deviation and n is the sample size.
JB = n[(Sk)2/6 + (Ku)2/24]

APPLICATION (HOW TO USE JB TEST FOR NORMALITY?)


Application of the test is based on purely hypothetical name and value (Table 1)
Null Hypothesis (H0): Hb% is not significantly different than normal
Alternative Hypothesis (H1): Hb% is significantly different than normal
In this case critical value of Chi –Square (�2) is 5.991 at 2 degree of freedom.
� �
N = 17 Mean (X ) = 5.35 � (xi – X ) = 0; i = 1,2,…..,17
� �
� (xi – X )2 = 11.162; i = 1,2,…..,17 � (xi – X )3 = 10.414; i=1,2,…..,17

� (xi – X )4 = 29.985; i = 1, 2,…..,17

Sk = {� (xi – X )3} / n * s3 = 1.0512

Ku = {� (xi – X )4} / n * s4 = 0.6207
JB = n[ (Sk)2/6 + (Ku)2/24] = 3.4040
In this case critical value of Chi –Square is 5.991 at 2 degree of freedom is greater than calculated
value of JB test.Therefore H0 is accepted and conclude thatHb% is not significantly different than normal.

2.3. D’Agostino’s(D- Normality) Test for Normality [1][14]


This is very powerful test based on D statistic. Statistic (D) is derived through sum of squared deviates of
data (SS) and sample size. First the data are arranged in ascending or descending order.
D = T/�[(n3 * SS)]and T = � [i –{(n + 1)/2}] Xi
Application on Table 1 observations (How to use D- Normality test?):
ASSUMPTION AND T ESTING OF N ORMALITY FOR STATISTICAL ANALYSIS 173

Null Hypothesis (H0 ): Hb% is not significantly different than normal


Alternative Hypothesis (H1): Hb% is significantly different than normal
Level of Significance (�) =0.05 = (5%)
Critical value of D = 0.2587, 0.2860

n = 17 Mean (X ) = 5.35 (n+1)/2= 9

SS = � (xi – X )2 = 11.162; i = 1,2,…..,17
T = � (i-9) xi; i =1, 2,…..,17
T = (1-9)*4.2+(2-9)*4.7+……….. +(17-9)*7.6
T = 61.0
D = T/�[(n3 * SS)] = 61/ �[(17^3 * 11.162)] = 0.2604
Since 0.2587 < D = 0.2604 < 0.2860.
Therefore H0 is accepted and conclude thatHb% is not significantly different than normal.
Some other test for normality are defined as

2.4. Anderson–Darling Test


The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a population
under given probability distribution. In its basic form, the test assumes that there are no parameters to be
estimated in the distribution being tested, in which case the test and its set of critical values is distribution-
free. However, the test is most often used in contexts where a family of distributions is being tested, in
which case the parameters of that family need to be estimated and account must be taken of this in adjusting
either the test-statistic or its critical values. When applied to testing if a normal distribution adequately
describes a set of data, it is one of the most powerful statistical tools for detecting most departures from
normality.K-sample Anderson–Darling tests are available for testing whether several collections of
observations.In addition to its use as a test of fit for distributions, it can be used in parameter estimation as
the basis for a form of minimum distance estimation procedure.[14-15]

2.5. Shapiro-Wilktest [16]


The Shapiro–Wilk test is a test of normality in statistics. It was published in 1965 by Samuel Sanford
Shapiro and Martin Wilk.The Shapiro–Wilk test utilizes the null hypothesis principle to check whether a
sample x1, ...,xn came from a normally distributed population. The test statistic is:

W = (� aixi )2 / �(xi – X )2 i = 1, 2, ………, n
where
xi (with parentheses enclosing the subscript index i) is the ith order statistic, i.e., the ith-smallest number in

the sample, X is the sample mean and the constantsai are given by
(a1, a2, ……….., an) = [mTV–1]/(mt V–1V–1m)1/2
and m = (m1, m2, ……….., mn)T
Where, m1, m2, ……….., mn are the expected values of the order statistics of independent and identically
distributed random variables sampled from the standard normal distribution, and V is the covariance matrix
of those order statistics. The user may reject the null hypothesis if W is below a predetermined value.
174 AJAY S. S INGH AND MICAH B. MASUKU

2.6. Omnibus K2 Statistic [17]


D’Agostino, Belanger &D’Agostino derived, Statistics Z1 and Z2 can be combined to produce an omnibus
test, able to detect deviations from normality due to either skewness or kurtosis [18]:
K2 = Z1(g1)2 + Z2(g2)2
If the null hypothesis of normality is true, then K2 is approximately �2-distributed with 2 degrees of
freedom.

SUMMARY
Research investigation is the part of a wider development with regard to finance, education, public health,
and agriculture, etc. that are indicators of better life of human beings. Modern applied research based on
better living management is quite complex, requiring multiple sets of skills such as agricultural, medical,
social, technological, mathematical, statistical etc. Suitable statistical tools and research designs provide
the unbiased estimates of the indicators, conclusions, and their interpretations. The statistical tools are
simple and applicable in various fields and also some software’s are available for calculations of the statistical
test. In other hand, assumptions and technicality behind thestatistical tools and suitability of the tests also
are more important. In this context, normality is one of the most important aspects for statistical analysis.
Normality condition is essential especially for parametric test and regression analysis. These discussed
tests are suitable for testing of normality. These statistical tests for normality are also useful for small
sample size. A careful consideration of normality will hopefully result in more meaningful studies whose
results and interpretations will eventually receive a high priority for acceptance and publication.

References
[1] Ghasemi, A., Zahediasl, S. (2012), Normality test for statistical analysis: A guide for non- statistician, Int. Jr.
of Endocrinology & Metabolism, 10(2), 486-489.
[2] A1tman, D.G., Bland, J. M. (1995), Statistics notes: the normal distribution, BMJ, 310, (6975): 298.
[3] Driscoll, P., Lecky, F., Crosby, M. (2000), An introduction to everyday statistics-1, Jr. Accid. Emerg. Medicine,
36(3): 205-11.
[4] Royston, P. (1991), Estimating departure from normality, Stat. Medicine, 10(8): 1283-93.
[5] Oztuna, D., Elhan, A.H., Tuccar E. (2006), Investigation of four different normality test in terms of type 1
error rate and power under different distributions, Turkish Journal of Medical Sciences, 36(3): 171-6.
[6] Normal Distribution, Gale Encyclopedia of Psychology.
[7] Casella & Berger (2002), Statistical Inference, Second Edition.
[8] Cover, Thomas M.; Thomas, Joy A. (2006), Elements of Information Theory, John Wiley and Sons. p. 254.
[9] Shorack, G. R., Wellner, J. A. (1986), Empirical Processes with Applications to Statistics, Wiley. ISBN 0-471-
86725-X.
[10] Eadie, W.T., D. Drijard, F.E. James, M. Roos and B. Sadoulet (1971), Statistical Methods in Experimental
Physics, Amsterdam: North-Holland. pp. 269–271. ISBN 0-444-10117-9.
[11] Corder, G. W., Foreman, D. I. (2009), Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach.
Wiley. ISBN 978-0-470-45461-9.
[12] Shapiro, S.S. (1980), How to test normality and other distributional assumptions. In: The ASQC basic references
in quality control: statistical techniques, 3, 1–78.
ASSUMPTION AND T ESTING OF N ORMALITY FOR STATISTICAL ANALYSIS 175

[13] Stuart, A.,Ord, K. Arnold, S. F. (1999), Classical Inference and the Linear Model. Kendall’s Advanced Theory
of Statistics 2A (Sixth ed.). London: Arnold.,25.37–25.43. ISBN 0-340-66230-1.MR 1687411.
[14] Stephens, M. A. (1974), EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American
Statistical Association 69: 730–737.
[15] Stephens, M. A. (1986), Tests Based on EDF Statistics, In D’Agostino, R.B. and Stephens, M.A. Goodness-
of-Fit Techniques. New York: Marcel Dekker.
[16] Shapiro, S. S.; Wilk, M. B. (1965), An analysis of variance test for normality (complete samples), Biometrika
52 (3–4): 591–611.
[17] Anscombe, F. J., Glynn, William J. (1983), Distribution of the kurtosis statistic b2 for normal statistics,
Biometrika 70 (1): 227–234.
[18] D’Agostino, Ralph B., Albert Belanger, Ralph B. D’Agostino, Jr. (1990), A suggestion for using powerful and
informative tests of normality, The American Statistician 44 (4): 316–321.

View publication stats

You might also like