Professional Documents
Culture Documents
MEASURES OF
SHAPE
PRACTICE EXERCISES
⚫ Review Questions 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.9, 8.11, 8.12 and 8.13.
1
4.2 MEASURES OF SHAPE
⚫ The measures of central tendency tell you about the central value in your data while measures of variation
tell you how spread is your data from the central value. What about the shape of your data?
Measures of Central Tendency measures the location of central value of the data
Measures of Variation measures the spread of data from the central value
Measures of Shape measures the direction / frequency of outliers away from the central value
⚫ A histogram provides insight into the shape of a data distribution. But the two methods that precisely
measure the shape of a distribution are skewness and kurtosis.
I SKEWNESS
2
1. SKEWNESS
⚫ The relative concentration of data may indicate whether the distribution is asymmetrical. A distribution is
called asymmetric when one tail is longer than the other.
⚫ Skewness measures the asymmetry of a data distribution around its mean. Thus, it quantifies how much of
the data is skewed to one side of the mean.
⚫ A skewness looks for the direction in a data distribution. Thus, a distribution may skewed towards one
direction.
❑ A direction regarding where the extreme value lies – right side, left side or both sides.
❑ A direction regarding where most values are situated – low-end, high-end or middle of a distribution.
⚫ We can use a simple but less precise method to describe the skewness of a distribution by comparing the
mean and median. However, we can use MS Excel to compute a skewness coefficient to precisely measure
skewness and accurately describe the direction of extreme value in a data distribution.
5
Dr. Raphael Djabatey
1. SKEWNESS
⚫ Your data is skewed by towards one direction an extreme value. A test of skewness helps you know which
direction your data is skewed towards. A data can be skewed towards the right or left direction relative to
the mean.
: Negative Positive
skewness skewness
indicates a indicates a
distribution with distribution with
an asymmetric an asymmetric
tail extending tail extending
toward more toward more
negative values. negative skewed zero skewed positive skewed positive values.
• A data skewed to the left when the extreme value lies to the left, as a
result most values are situated at the high-end of the distribution.
• A data skewed to the right when the extreme value lies to the right, as a
result most values are situated at the low-end of the distribution.
• A data is not skewed (i.e. symmetry) when the extreme values lie at both
sides, as a result most data are situated in the middle of the distribution.
6
Dr. Raphael Djabatey
3
1. SKEWNESS
Relationship
Mean Less Mean Equal Mean More
Between Mean
Than Median To Median Than Median
and Median
Less Than Between More Than
Excel Coefficient
- 0.5 - 0.5 and + 0.5 + 0.5
7
Dr. Raphael Djabatey
1. SKEWNESS
1. Left-Skewed Distribution 4, 80, 82, 88, 88, 90, 100
⚫ This is also known as a negative-skew distribution. This arises when the extreme values lie to the left of the
distribution and the bulk of the data is situated at the high-end of the distribution.
⚫ The distribution is characterized by high values, hence most values are above the mean. The distribution is
right-skewed when the mean is less than median. The above dataset has the mean of 76 and median of 88.
⚫ In a mathematical term, the mean minus median will yield a negative result (76 – 88 = - 12). Since the result is
a negative value, the distribution is described as negative skewed.
⚫ EXCEL provides a robust method where a skewness co-efficient of less than – 0.5 indicates a left skewed.
⚫ On a chart, we observe a long tail to the left of the distribution, stretching towards the negative direction.
4
1. SKEWNESS
2. Right-Skewed Distribution 2, 4, 4, 7, 10, 12, 80
⚫ This is also known as a positive-skew distribution. This arises when the extreme values lie to the right of the
distribution and the bulk of the data is situated at the low-end of the distribution.
⚫ The distribution is characterized by low values, hence most values are below the mean. The distribution is
right-skewed when the mean is more than median. The above dataset has the mean of 17 and median of 7.
⚫ In a mathematical term, the mean minus median will yield a positive result (17 – 7 = 10). Since the result is a
positive value, the distribution is described as positive skewed.
⚫ EXCEL provides a robust method where a skewness co-efficient of more than 0.5 indicates a right skewed.
⚫ On a chart, we observe a long tail to the right of the distribution, stretching towards the positive direction.
1. SKEWNESS
3. Normal Distribution 1, 3, 4, 5, 6, 8
⚫ This is also known as a zero-skew or symmetric distribution. This arises when extreme values lie both to the
left and right, and the bulk of the data is situated around the center or middle of the distribution.
⚫ The data is characterized by average values, hence most values are similar to the mean. The distribution is
symmetry when the mean is equal to median. The above dataset has the mean of 4.5 and median of 4.5.
⚫ In a mathematical term, the mean minus median will yield a zero result (4.5 – 4.5 = 0). Since the result is a
zero value, a symmetrical distribution is also described as zero skewed.
⚫ EXCEL provides a robust method where a skewness co-efficient between - 0.5 and + 0.5 indicates an
approximate symmetry while a skewness coefficient equal to zero (0) indicates a perfect symmetry.
⚫ On a chart, we observe tails to the left and right of the distribution, stretching towards the negative and
positive directions.
10
Mean = Median = Mode Dr. Raphael Djabatey
10
5
1. SKEWNESS
MS Excel Skewness Coefficient
symmetry symmetry
(approximate) (approximate)
0.0
symmetry symmetry
(perfect) (perfect)
11
Dr. Raphael Djabatey
11
1. SKEWNESS
Shopping Time
911 Emergency Waiting Time
90
18
80 Left-Skewed
16
70
14 Right-Skewed
60
12
50 10
40 8
30 6
20 4
10 2
0 0
0 - 15 15 - 30 30 - 45 45 - 60 60 - 75 75 - 90 90 -105 0-5 5 - 10 10 - 15 15 - 20 20 - 25 25 - 30 30 - 35
Students' Grades
80
70 Normal Distribution
60
50
40
30
20
10
0
20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80 80 - 90 12
Dr. Raphael Djabatey
12
6
II KURTOSIS
13
13
2. KURTOSIS
⚫ Kurtosis measures the tail-ness and peaked-ness of a distribution relative to a normal distribution. It
describes the degree to which values are clustered in the tail or peak of a distribution.
⚫ Kurtosis determines whether the tails of a data distribution match the normal distribution.
It compares the tails of a distribution to the normal distribution to determine how heavy or light are the tails.
⚫ Kurtosis quantifies the tail-ness of a distribution by measuring the frequency of extreme observations in the
dataset. It looks for whether a distribution have too much or fewer extreme values than a normal distribution.
⚫ Modern definition shows that kurtosis is influenced more by extreme values (tails) than the values in the
center (peak) of the distribution. Thus, recent definition measures how much the variation in the data is due
to extreme values. There are three forms of kurtosis:
❑ When the distribution has more variation than normal it is called leptokurtic distribution.
❑ When the distribution has less variation than normal it is called platykurtic distribution.
❑ When the distribution has same variation as normal it is called mesokurtic distribution.
14
Dr. Raphael Djabatey
14
7
2. KURTOSIS
Platykurtic Mesokurtic Leptokurtic
Distribution Distribution Distribution
15
2. KURTOSIS
leptokurtic
platykurtic
16
Dr. Raphael Djabatey
16
8
2. KURTOSIS
⚫ The larger the kurtosis coefficient, the more extreme values there are, the fatter the tails are, the more
peaked the distribution around the mean, and the more risk there is in the distribution as compared to a
normal distribution.
⚫ The smaller the kurtosis coefficient, the less extreme values there are, the thinner the tails are, the less
peaked the distribution around the mean, and the less risk there is in the distribution as compared to a
normal distribution.
light tail, few extreme values --------------------------- heavy tail, more extreme value
less peaked at center --------------------------- high peaked at center
small changes are more frequent --------------------------- small changes are less frequent
less risk due to fewer extreme value ------------------------- more risk due to a lot of extreme value
17
Dr. Raphael Djabatey
17
2. KURTOSIS
1. Platykurtic Distribution
⚫ This is also known as a negative kurtosis and it describes a distribution with fewer extreme values and fewer
peak values (i.e. around the mean) than a normal distribution.
⚫ A negative kurtosis is less-tailed (or light-tailed) because it has fewer extreme values in the tails than a normal
distribution. It is also less-peaked (or low-peaked) because less values are clustered around the mean.
⚫ Because it is light-tailed and low peaked, there are less data in the center, more data in the shoulder and less
data in the tail. Such distribution indicates less variation or less risk than normal.
⚫ A negative kurtosis indicates less variation because small changes are leptokurtic
more common while large changes are less likely.
18
Dr. Raphael Djabatey
18
9
2. KURTOSIS
2. Leptokurtic Distribution
⚫ This is also known as a positive kurtosis and it describes a distribution with more extreme values and more peak
values (i.e. around the mean) than a normal distribution.
⚫ A positive kurtosis is more-tailed (or heavy-tailed) because it has more extreme values in the tails than a normal
distribution. It is also more-peaked (or high-peaked) because the more values are clustered around the mean.
⚫ Because it is heavy-tailed and high peaked, there are more data in the center, less data in the shoulder and more
data in the tail. Such distribution indicates more variation and more risk than normal.
19
Dr. Raphael Djabatey
19
2. KURTOSIS
3. Mesokurtic Distribution
⚫ This is also known as a zero kurtosis and it describes a distribution which has the same characteristics as a
normal distribution.
⚫ A zero kurtosis has same amount of values in the tails (extreme values) as a normal distribution as well as
the same degree of clustering around the mean (peak values) as a normal distribution.
⚫ In EXCEL, an excess kurtosis between – 0.5 and + 0.5 indicates a distribution with approximately zero
kurtosis. An excess kurtosis coefficient equal to 0.0 represents a perfect or zero kurtosis.
20
10
2. KURTOSIS
Excess Kurtosis Coefficient
0.0
perfect perfect
kurtosis kurtosis
21
Dr. Raphael Djabatey
21
2. EXCEL COEFFICIENTS
22
11