You are on page 1of 4

SKEWNESS

Skewness of a probability distribution or a dataset is meant to measure its lack of symmetry around its centre. There are different ways to capture this skewness; we mention two popular ways in the following. Naturally, (any) skewness measure of a symmetric distribution (or data-set) must necessarily be zero, while the measure should be positive if it is skewed to the right (alternative terminology: rightskewed/positively skewed), i.e. the distribution (data) has longer tail in the right end. The distributions (data) with longer left tail should have negative values as skewness measure and are termed skewed to the left/left-skewed/negatively skewed. Below are some illustrations:

Skewed to the right

Symmetric

Skewed to the left

For a population with N values X 1 , X 2 , X N , having mean and standard deviation , the most popular skewness measure is defined as: 1 N ( X i )3 N = i =1 3 (1)

[Some people term this as relative skewness, while referring only the numerator to skewness.] The sample skewness measure, based on n sample data points x1 , x2 , xn is usually computed as:

n x x i s ; (n 1)(n 2)

(2)

this is the skewness value, which is usually reported in most software packages, including Descriptive Statistics menu under `Data Analysis in MS Excel. The factor sitting in front of the formula in (2) may seem unnatural at first sight; but the reason for that is similar is to the devisor being (n-1) in the formula for sample standard deviation. Indeed, is an unbiased estimator of ; the notion of unbiased-ness will be explained in lot more details, later on. However, at that time we would not have opportunity to discuss skewness (and its estimation); thus it suffices to mention here, the formula (2) is put in place to make a more accurate reflection (estimator) of the population skewness parametric , when the latter value is unknown. A positive or negative value of indicates that the data is skewed to the right or left. But what can we say, regarding the magnitude of skewness? For example, the (sample)

Prepared by: Shubhabrata Das, IIM Bangalore

skewness value for the Bangalore flat price data turns out to be around 1.7. Does it mean that the price distribution of ALL Bangalore flat's is really skewed? Note that, it is entirely possible that the population is symmetric (and hence is 0), yet the sample data points are not perfectly symmetric and hence is positive or negative. is a random variable while is a fixed unknown quantity, assuming the sample is selected randomly from the population. How much of fluctuation in can be attributed to the randomness of the sample? And consequently, how far needs to be away from zero for us to reasonably conclude that the population is not symmetric ( is different 0)? These are precisely the type of inference questions that we target to answer (later on) objectively in statistical inference. Since the inference of skewness will remain beyond our purview, let us now provide an empirical guideline for the above determination. We clearly need a benchmark; that benchmark comes in the form of standard error (S.E.) of our estimator . The notion of standard error will be subsequently explained in lot more depth. For the time being, it suffices to note that an approximate value for standard 6 . [Caution: the standard error value reported under the error of is given by n Descriptive Statistics menu in MS Excel, reports the standard error of sample mean; that has nothing to do with the standard error of (sample) skewness, that is discussed here.] The empirical guideline goes as follows: 6 If > 2 , we conclude the population is right-skewed as is signficantly +ve; n

if < 2

6 , we conclude the population is left-skewed as is signficantly -ve; n

6 , we say that the population is symmetric. n You would have a proper understanding of the above recommendation, including the justification of the coefficient 2, when we discuss testing of hypothesis. For example, we will see why in the last scenario, it would be more appropriate to phrase our conclusion as there is NOT significant evidence that the distribution is asymmetric and if | |< 2
We conclude our discussion on skewness by adding that there are many other alternative measures of skewness. Pearson's measure of skewness, is one of them and is also reasonably popular because of its simplicity. It is given by M ,

where , M and are respectively the mean, mode and standard deviation of the 3( m) distribution. A somewhat equivalent form via is also popular where m denotes
the median. [The empirical approximate relationship M 3( m) holds good for

Prepared by: Shubhabrata Das, IIM Bangalore

many/most data and distributions.] These skewness measures are motivated by the fact that for distributions, with heavier/longer right tail, the mean would be bigger than median or mode, while the situation is opposite for the left skewed distributions.

KURTOSIS
Kurtosis depicts a much more intricate (and somewhat controversial) characteristic of the distribution than a measure of central tendency, dispersion or asymmetry. It is meant to capture the peakedness or flatness of a distribution; the benchmark for comparison is taken to be the celebrated Normal distribution, which will study in great detail later on. Using the same symbols as in the definition of skewness, the most popular measure of kurtosis is defined as: 1 N ( X i )4 N i =1 3 (3) = 4

The number 3 is subtracted in the above definition, only to ensure that the kurtosis for any normal distribution would be 0. [Some practitioners refer to (3) as relative kurtosis while for absolute kurtosis they refer to the same without subtracting 3.] Distributions with positive kurtosis are relatively more peaked distribution at the middle than Normal --- they are usually referred to as Leptokurtic distributions. Negative kurtosis indicates a relatively flat distribution ---- such distributions are called Platykurtic distributions. Distributions which are as flat (peaked) at the centre as a Normal distribution would have =0 and they are called Mesokurtic distributions. The sample kurtosis value, as is reported by default in software packages like MS- Excel, is defined as:
n n(n + 1) 3(n 1)2 xi x = (n 1)(n 2)(n 3) i =1 s (n 2)(n 3) 4

(4)

The apparent messy adjustment in (4), in comparison to (3), involving sample size n, is to ensure that the sample kurtosis would be an unbiased estimator of the population kurtosis .
Again, a relatively small, positive or negative value of should not be taken as a proof that the population distribution is Non-Normal or even being non-zero. We must provide allowance for sampling fluctuation. To do this objectively, one needs to consider 24 the standard error of , an approximate expression for which is given by . n Consequently, the empirical guideline for determination of distribution type is:

Prepared by: Shubhabrata Das, IIM Bangalore

If > 2 if < 2

24 , we say is signficantly +ve, implying that the population is Leptokurtic; n 24 , we say is signficantly -ve, implying that the population is Platykurtic; n 24 , we say that the population is Mesokurtic. n

and if | |< 2

For some statistical analysis, it is important to ensure that the sample data comes from a Normal distribution. Later on, we would learn various formal ways of determining (testing) whether a certain population is Normally distributed or not on the basis of sample data. However, the above two guidelines (for skewness and kurtosis, in combination) provide a quick check towards the same. For example, if you cannot reasonably rule out the possibility that the distribution is symmetric and is Mesokurtic on the basis of and , then you should be reasonably comfortable in assuming that the data comes from a Normal population. An alternate measure of kurtosis is given by 0.5 ( 0.75 0.25 ) , where p denotes the p-th

0.9 0.1

percentile of the distribution (data). This measure clearly lies between 0 and 0.5. For a Normal distribution, this measure turns out to be 0.263. Consequently, values above 0.263 correspond to a leptokurtic distribution, while values below 0.263 indicate a platykurtic distribution.

Prepared by: Shubhabrata Das, IIM Bangalore

You might also like