You are on page 1of 34

Relative

Measures of
variation,
skewness
and kurtosis
The coefficient of relative variation (CRV) is the
STANDARD DEVIATION of a variable normed by dividing It wouldn’t make any sense to compare the mean
by its mean. This measure is used to give a sense of how returns achieved by two different managers
large the standard deviation is. without explicitly considering the levels of risk that
they have incurred. A measure of relative variation
How to Use Relative Variation to Find the Uncertainty provides a number that considers both the risk
Associated with a Data Set and the return of a portfolio, so that it can be
determined which portfolio is riskier relative to
Relative variation refers to the spread of a sample or a the return.
population as a proportion of the mean. Relative
You can use several different types of measures of
variation is useful because it can be expressed as a
relative variation. One of the most popular is known
percentage, and is independent of the units in which
as the coefficient of variation (CV), which indicates
the sample or population data are measured.
how “spread out” the members of a sample or
population are relative to the mean. The coefficient
For example, you can use a measure of relative of variation is measured as a percentage, so it’s
variation to compare the uncertainty or variation independent of the units in which the mean and
associated with the temperature in two different standard deviation are measured. This enables the
countries, even if one country uses Fahrenheit relative variation of different samples or populations
temperatures and the other uses Celsius to be compared directly to each other.
temperatures. As another example, a measure of
relative variation can be useful for comparing the
returns earned by two portfolio managers.
Comparative Prices Charged by Superior
For example, the coefficient of variation can Accounting and Data Services
express the risk of an investment portfolio per unit Price Superior Accounting Data Services
of return. This means you can compare the
Mean price ($/hour) $200 $175
performance of different portfolios to see which
one offers the least amount of risk per unit of Standard deviation $80 $75
return. ($/hour)
Here’s the formula for finding the
coefficient of variation for either samples
or populations: Based on this data, the coefficient of variation for the
                            prices charged by each firm are

Suppose a corporation requires the services of a


consulting firm to improve its accounting systems.
                      
The corporation has determined that the two best These results show that although the prices charged by
choices are Superior Accounting, Inc., and Data Superior Accounting have a larger standard deviation
Services Corp. The corporation has done some than Data Services, the relative variation of Data
research about the pricing practices of these two Services is greater (42.86 percent compared with 40.00
firms. The average price charged per hour, along percent.) This indicates that the relative uncertainty
with the standard deviation, are shown in the table: associated with Data Services’ prices is greater than for
Superior Accounting’s prices.
As another example, suppose a portfolio manager is These results show that the equity portfolio offers a
responsible for an insurance company’s equity higher average (mean) return than the bond portfolio
portfolio and bond portfolio. He wants to know which and that the equity portfolio is riskier in absolute terms
portfolio is riskier in absolute and relative terms. He than the bond portfolio.
takes a sample of returns from the past ten years and Because the two portfolios offer different returns and
computes the mean and standard deviation. This table different levels of risk, it’s impossible to compare them
shows the results: directly without using a measure of relative risk, which
shows how risky a portfolio is relative to its return. So
Comparative Performance of Bond and Equity Portfolios you need to find the coefficient of variation for the two
portfolios, using the CV formula:
Returns Bond Portfolio Equity Portfolio
Mean return 8% 20%
Standard 16% 30%
deviation of
returns The bond portfolio offers a level of risk that’s 200
percent of the average return, while the equity
portfolio offers a level of risk that’s 150 percent of the
average return. So while the equity portfolio is riskier
in absolute terms (due to the higher standard deviation)
the bond portfolio is riskier in relative terms (due to the
higher coefficient of variation).
A chemical engineer is studying a new developed compound  
𝑚𝑒𝑎𝑛= 𝑀 +
∑ 𝑓𝑑 𝐶   𝐶 2 2
to be used in removing toxic waste from water. 𝑁 𝑆𝐷= √ 𝑁 ∑ 𝑓 𝑑 −(𝑓𝑑 )
28
𝑁
In Number Midpoin d fd fd2  
𝑚𝑒𝑎𝑛=67 + (5) 𝑆𝐷
  5
√ ( 100 ) ( 378 ) − ¿ ¿
purities of t 100 =
100
removed testing
stations 𝑚𝑒𝑎𝑛=67
  +.28(5) 𝑆𝐷=0.05
  √ 37,800 −784
52 -3  𝑚𝑒𝑎𝑛=67 +1.4 𝑆𝐷=0.05
50-54 4 -12 36   √ 37 , 016
57 -2 -30  𝑚𝑒𝑎𝑛=68.4
55-59 15 60 𝑆𝐷
  =0.05(192.40)
60-64 21 62 -1 -21 21 𝑆𝐷=9.62
 
65-69 19 67 0 0 0
72 1 The coefficient of variation is
70-74 15 15 15
75-79 11 77 22 44 x 100%
2  
80-84 8 82 3 24 72
85-89 5 87 4 20 80   x 100%
90-94 2 92 5 10 50
𝑉  =14.14 %
100 28 378
fd2  
𝑚𝑒𝑎𝑛= 𝑀 +
∑ 𝑓𝑑
𝐶
  𝐶 2 2
𝑁 𝑆𝐷= √ 𝑁 ∑ 𝑓 𝑑 −(𝑓𝑑 )
𝑁

The coefficient of variation is

  x 100%
The mean heat capacity of a certain substance is The average sales per month of MAI computer company is 59
80 degree Celsius with a standard deviation of 5 million pesos with a SD of 8 million pesos: the average sales
degree Celsius. The mean calorie per gram is volume of the company is 580 units per month, with SD of 65
0.638 and the standard deviation is 0.0125. units. Compare the variation of sales and sales volume of the
compare the variation of the two. company.

  x 100%
  x 100%
  x 100%

  %

  x 100%

  %
Since the coefficient of variation of the calorie Since the coefficient of variation of sales per
per grams is larger . The calorie per gram is month is larger . The sales per month is more
more variable than temperature. variable than unit per month.
Skewness
What Is Skewness?
Skewness refers to a distortion or asymmetry that deviates from
the symmetrical bell curve, or normal distribution, in a set of
data. If the curve is shifted to the left or to the right, it is said to
be skewed. Skewness can be quantified as a representation of
the extent to which a given distribution varies from a normal
distribution. A normal distribution has a skew of zero, while a 
lognormal distribution, for example, would exhibit some degree
of right-skew.
As you might already know, India has more than 50% of its Why is Skewness Important?
population below the age of 25 and more than 65% below
the age of 35. If you’ll plot the distribution of the age of Now, we know that the skewness is the measure of
the population of India, you will find that there is a hump asymmetry and its types are distinguished by the side on
on the left side of distribution and the right side is which the tail of probability distribution lies. But why is
comparatively planar. In other words, we can say that knowing the skewness of the data important?
there’s a skew towards the end, right?
First, linear models work on the assumption that the
Skewness is the measure of the asymmetry of an ideally
distribution of the independent variable and the
symmetric probability distribution and is given by the third
target variable are similar. Therefore, knowing about
standardized moment. If that sounds way too complex.
the skewness of data helps us in creating better
In simple words, skewness is the measure of how much linear models.
the probability distribution of a random variable deviates Secondly, let’s take a look at the below distribution. It
from the normal distribution.  is the distribution of horsepower of cars:

Well, the normal distribution is the probability distribution


without any skewness. You can look at the image below
which shows symmetrical distribution that’s basically a
normal distribution and you can see that it is symmetrical
on both sides of the dashed line. Apart from this, there are
two types of skewness:
You can clearly see that the above distribution is positively
skewed. Now, let’s say you want to use this as a feature
for the model which will predict the mpg (miles per gallon)
of a car.
Since our data is positively skewed here, it means that it
has a higher number of data points having low values, i.e.,
cars with less horsepower. So when we train our model on
this data, it will perform better at predicting the mpg of
cars with lower horsepower as compared to those with
higher horsepower.
Also, skewness tells us about the direction of outliers. You
can see that our distribution is positively skewed and
most of the outliers are present on the right side of the
distribution.
Note: The skewness does not tell us about the number of
outliers. It only tells us the direction.
Now we know why skewness is important, let’s
understand the distributions which I showed you earlier.
What is Symmetric/Normal Distribution? You can see in the above image that the same line
represents the mean, median, and mode. It is because
the mean, median, and mode of a perfectly normal
distribution are equal.
So far, we’ve understood the skewness of normal
distribution using a probability or frequency distribution.
Now, let’s understand it in terms of a boxplot because
that’s the most common way of looking at a distribution
in the data science space.
Yes, we’re back again with the normal distribution. It is
used as a reference for determining the skewness of a
distribution. As I mentioned earlier, the ideal normal
distribution is the probability distribution with almost no The above image is a boxplot of symmetric distribution.
skewness. You’ll notice here that the distance between Q1 and Q2
and Q2 and Q3 is equal i.e.:
It is nearly perfectly symmetrical. Due to this, the value of
                                     
skewness for a normal distribution is zero. But that’s not enough for concluding if a distribution is
But, why is it nearly perfectly symmetrical and not skewed or not. We also take a look at the length of the
absolutely symmetrical? whisker; if they are equal, then we can say that the
That’s because, in reality, no real word data has a perfectly distribution is symmetric, i.e. it is not skewed.
normal distribution. Therefore, even the value of Now that we’ve discussed the skewness in the normal
skewness is not exactly zero; it is nearly zero. Although distribution, it’s time to learn about the two types of
the value of zero is used as a reference for determining skewness which we discussed earlier. Let’s start with
positive skewness.
the skewness of a distribution.  
Understanding Positively Skewed Distribution    In the above boxplot, you can see that Q2 is present nea
to Q1. This represents a positively skewed distribution.
terms of quartiles, it can be given by:

A positively skewed distribution is the distribution with the In this case, it was very easy to tell if the data is
tail on its right side. The value of skewness for a positively skewed or not. But what if we have something
skewed distribution is greater than zero. As you might like this:
have already understood by looking at the figure, the value                          
of mean is the greatest one followed by median and then Here, Q2-Q1 and Q3-Q2 are equal and yet the
by mode. distribution is positively skewed. The keen-eyed
So why is this happening? among you will have noticed the length of the
Well, the answer to that is that the skewness of the right whisker is greater than the left whisker.
distribution is on the right; it causes the mean to be From this, we can conclude that the data is
greater than the median and eventually move to the right. positively skewed.
Also, the mode occurs at the highest frequency of the So, the first step is always to check the equality
distribution which is on the left side of the median. of Q2-Q1 and Q3-Q2. If that is found equal, then
Therefore, mode < median < mean. we look for the length of whiskers.
Understanding Negatively Skewed Distribution

Wiki In the boxplot, the relationship between quartiles for a


negative skewness is given by:

As you might have already guessed, a negatively


skewed distribution is the distribution with the tail
on its left side. The value of skewness for a
negatively skewed distribution is less than zero.
You can also see in the above figure that the mean
< median < mode. Similar to what we did earlier, if Q3-Q2 and Q2-Q1 are
equal, then we look for the length of whiskers. And if the
                              length of the left whisker is greater than that of the right
whisker, then we can say that the data is negatively
skewed.
                        
Understanding Skewness Measuring Skewness
Besides positive and negative skew, distributions can also There are several ways to measure
be said to have zero or undefined skew. In the curve of a skewness. Pearson’s first and second
distribution, the data on the right side of the curve may coefficients of skewness are two common
taper differently from the data on the left side. These ones. Pearson’s first coefficient of
taperings are known as "tails." Negative skew refers to a skewness, or Pearson mode skewness,
longer or fatter tail on the left side of the distribution, subtracts the mode from the mean and
while positive skew refers to a longer or fatter tail on the divides the difference by the 
right. standard deviation. Pearson’s second
coefficient of skewness, or Pearson median
skewness, subtracts the median from the
The mean of positively skewed data will be greater than mean, multiplies the difference by three,
the median. In a distribution that is negatively skewed, and divides the product by the standard
the exact opposite is the case: the mean of negatively deviation.
skewed data will be less than the median. If the data
graphs symmetrically, the distribution has zero skewness,
regardless of how long or fat the tails are.
What Does Skewness Tell You?
Investors note skewness when judging a return distribution
because it, like kurtosis, considers the extremes of the data set
rather than focusing solely on the average. Short- and medium-
term investors in particular need to look at extremes because
they are less likely to hold a position long enough to be
confident that the average will work itself out.

Investors commonly use standard deviation to predict future 


returns, but the standard deviation assumes a normal
distribution. As few return distributions come close to normal,
skewness is a better measure on which to base performance
predictions. This is due to skewness risk.

Skewness risk is the increased risk of turning up a data point of


high skewness in a skewed distribution. Many financial models
Pearson’s first coefficient of skewness is useful if the
that attempt to predict the future performance of an asset
data exhibit a strong mode. If the data have a weak
 assume a normal distribution, in which measures of central
mode or multiple modes, Pearson’s second
tendency are equal. If the data are skewed, this kind of model
coefficient may be preferable, as it does not rely on
will always underestimate skewness risk in its predictions. The
mode as a measure of central tendency.
more skewed the data, the less accurate this financial model will
be.
where
•Xi = ith Random Variable
•X= Mean of the Distribution
•N = Number of Variables in the Distribution
•Ơ = Standard Distribution
Calculation of Skewness (Step by Step) •Step 4: Next, determine the standard deviation of the
distribution by using the deviations of each variable from the
Let us take the example of a summer camp in which 20 mean, i.e., Xi – X and the number of variables in the
students assigned certain jobs that they performed to distribution. The standard deviation is calculated, as shown
earn money to raise funds for a school picnic. However, below.
different students earned a different amount of money.
Based on the information given below, determine the
skewness in the income distribution among the
•Step 5: Finally, the calculation of skewness is done on the
students during the summer camp.
basis of the deviations of each variable from the mean, a
•Step 1: Firstly, form a data distribution of random variables, number of variables, and the standard deviation of the
and these variables are denoted by X . distribution, as shown below.
i

•Step 2: Next, figure out the number of variables available in


the data distribution, and it is denoted by N.
•Step 3: Next, calculate the mean of the data distribution by
dividing the sum of all the random variables of the data
distribution by the number of variables in the distribution. The
mean of the distribution is denoted by X.
Let us take the example of a summer camp in which 20 Now, the mean of the distribution can be calculated as,
students assigned certain jobs that they performed to
earn money to raise funds for a school picnic. However,
different students earned a different amount of money.
Based on the information given below, determine the
skewness in the income distribution among the
students during the summer camp.
The following is the data for the calculation of skewness.

              
Number of variables, n = 2 + 3 + 5 + 6 + 4= 20

Let us calculate the midpoint of each of the intervals


•($0 + $50) / 2 = $25
•($50 + $100) / 2 = $75
•($100 + $150) / 2 = $125
•($150 + $200) / 2 = $175
•($200 + $250) / 2 = $225 Mean= ($25 * 2 + $75 * 3 + $125 * 5 + $175 * 6 + $225 * 4) / 20
Mean = $142.50
The squares of the deviations of each variable can be calculated ơ = [(13806.25 * 2 + 4556.25 * 3 + 306.25 * 5 + 1056.25 * 6 +
as below, 6806.25 * 4) / 20]1/2
•($25 – $142.5)2 = 13806.25 ơ = 61.80
•($75 – $142.5)2 = 4556.25
•($125 – $142.5)2 = 306.25 The cubes of the deviations of each variable can be calculated
•($175 – $142.5)2 = 1056.25 as below,
•($225 – $142.5)2 = 6806.25 •($25 – $142.5)3 = -1622234.4
•($75 – $142.5)3 = -307546.9
Now, the standard deviation can be calculated by using the •($125 – $142.5)3 = -5359.4
below formula as, •($175 – $142.5)3 = 34328.1
•($225 – $142.5)3 = 561515.6

Therefore, Calculation of skewness of


the distribution will be as follows,
Skewness will be –

= (-1622234.4 * 2 + -307546.9 * 3 + -5359.4 * 5 + 34328.1 * 6


+ 561515.6 * 4) /[ (20 – 1) * (61.80)3] Skewness = -0.39
Therefore, the skewness of the distribution is -0.39, which
indicates that the data distribution is approximately symmetrica
Assignment 1
Activity 1
a. b.
Kurtosis: Kurtosis measures whether your dataset is heavy-tailed or
light-tailed compared to a normal distribution. Data sets with high
kurtosis have heavy tails and more outliers and data sets with low
kurtosis tend to have light tails and fewer outliers. Note that a
histogram is an effective way to show both the skewness and
kurtosis of a data set because you can easily spot if something is
wrong with your data. A probability plot is also a great tool because
a normal distribution would just follow the straight line.
https://www.dummies.com/education/math/business-
statistics/how-to-use-relative-variation-to-find-the-uncertainty-
associated-with-a-data-set/

https://www.investopedia.com/terms/s/skewness.asp

https://www.analyticsvidhya.com/blog/2020/07/what-is-
skewness-statistics/

You might also like