You are on page 1of 28

Skewness, kurtosis, and

boxplot

DR. LEONORA T. DELA CRUZ


Skewness
Data can be "skewed", meaning it tends to have a long tail on one
side or the other:
Skewness
The mean, median, and mode can be used to figure out if you have a positively or
negatively skewed distribution.
If the mean is greater than the mode, the distribution is positively skewed.
If the mean is less than the mode, the distribution is negatively skewed.
If the mean is greater than the median, the distribution is positively skewed.
If the mean is less than the median, the distribution is negatively skewed.
How to calculate Skewness?
How to interpret Skewness?
How to interpret:
If skewness is less than −1 or greater than +1, the distribution is highly skewed.
If skewness is between −1 and −½ or between +½ and +1, the distribution is
moderately skewed.
If skewness is between −½ and +½, the distribution is approximately symmetric.

Caution: This is an interpretation of the data you actually have. When you have data
for the whole population, that’s fine. But when you have a sample, the sample
skewness doesn’t necessarily apply to the whole population. In that case the question
is, from the sample skewness, can you conclude anything about the population
skewness? To answer that question, see the next section.
Skewness
Exercises 18: Calculate the skewness of the data in (a) exercises 6; and (b)
Exercises 7 using Pearson’s Coefficient of Skewness.
Describe the data.
Exercises 6 (Raw Data): The ages of 15 randomly selected customers at a local Best Buy are listed
below:
23, 21, 29, 24, 31, 21, 27, 23, 24, 32, 33, 19, 24, 21, 31
Determine the mean, median and mode of the data.
Array: 19, 21, 21, 21, 23, 23, 24, 24, 24, 27, 29, 31, 31, 32, 33
Mean Median Mode:
8th score The value that occurs most

x̄ = 383 Mdn = 24 Mo: 21 and 24


15
x̄ = 25.53
Exercises 6 (Raw Data): The ages of 15 randomly selected customers at a local Best Buy are listed
below:
23, 21, 29, 24, 31, 21, 27, 23, 24, 32, 33, 19, 24, 21, 31
Determine the mean, median and mode of the data.
Array: 19, 21, 21, 21, 23, 23, 24, 24, 24, 27, 29, 31, 31, 32, 33
Age x - x̄ (x - x̄ )2
19 -6.6 43.56
21 -4.6 21.16
x̄ = 25.53
21 -4.6 21.16
22 -3.6 12.96 Mdn = 24
23 -2.6 6.76 s = √(287.6/14) Sk2 = 3(25.53 – 24)
23 -2.6 6.76
24 -1.6 2.56
s = √20.54 4.53
24 -1.6 2.56 s = 4.53
Sk2 = 3(1.53)
24 -1.6 2.56
27 1.4 1.96 4.53
29 3.4 11.56
31 5.4 29.16 Sk2 = 4.59 = 1.01
31 5.4 29.16
32 6.4 40.96 4.53 greater than +1
33 7.4 54.76
287.6 highly skewed
Exercises 7 (Frequency Table): Determine the mean, median and mode
of a simple frequency distribution of the retirement age data.

fx Mean Median
162 6th score
55
56 Mdn = 57
114 x̄ = 623
11
116
x̄ = 56.64 Mode:
120
n = 11
The value that occurs most
Σfx = 623
Mo: 54
Exercises 7 (Frequency Table): Determine the mean, median and mode
of a simple frequency distribution of the retirement age data.
x̄ = 56.64 Mo: 54
fx (x - x̄ )2 (x - x̄ )2 f
x - x̄
162 -2.64 6.9696 20.9088
55 -1.64 2.6896 2.6896
56 -0.64 0.4096 0.4096 Sk2 = (56.64 – 54)
114 0.1296 0.2592
0.36 2.25
116 1.8496 3.6992
1.36 Sk2 =
120 3.36 11.2896 22.5792 1.17
n = 11 greater than +1.0
Σfx = 623 50.5456
highly skewed
s=√(50.5456/10)

s=√(5.05456)

s = 2.25
Kurtosis
Kurtosis tells you the height and sharpness of the central peak,
relative to that of a standard bell curve.
Leptokurtic - Tall and skinny compared
to the Normal bell curve with an excess
of extreme values causing the tails to be
thicker than the Normal bell curve.

Platykurtic - Short and fat compared to


the Normal bell curve with fewer
extreme values causing the tails to be
thinner than the Normal bell curve.

Mesokurtic - The Normal bell curve is a


common example.
How to calculate Kurtosis
Kurtosis measures the "fatness" of the tails of
a distribution.
Positive excess kurtosis has fatter tails than
a normal distribution.
How to calculate Kurtosis
 In kurtosis calculation, a result of +3.00
indicates the absence of kurtosis
(distribution is mesokurtic). For simplicity,
some statisticians adjust the result to zero
(i.e. kurtosis minus 3 equals zero), and then
any reading other than zero is referred to as
excess kurtosis.
How to calculate Kurtosis
 Negative numbers indicate a platykurtic
distribution; positive numbers indicate a
leptokurtic distribution.
How to calculate Kurtosis

How to interpret:
 A normal distribution has excess kurtosis exactly 0. Any distribution with excess ≈0 is called
mesokurtic.
 A distribution with excess kurtosis <0 is called platykurtic. Compared to a normal distribution,
its tails are shorter and thinner, and often its central peak is lower and broader.
 A distribution with excess kurtosis >0 is called leptokurtic. Compared to a normal distribution,
its tails are longer and fatter, and often its central peak is higher and sharper.
Kurtosis:
Exercises 19: Calculate the kurtosis of the data in Exercises 9.
Describe the distribution based on the results.
Exercises:
Exercises 9: The following data are marks obtained by 20 students in a test of
statistics. Determine the Q1, Q2, and Q3
53 74 82 42 39 20 81 68 58 28
67 54 93 70 30 55 36 38 29 61

Array 2 28 29 30 36 38 39 42 53 54 55 58 61 67 68 70 74 81 82 93
0
Oder 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Q1 = value of (20 + 1)th item Q2 = value of (20 + 1)th item Q3 = value of 3(20 + 1)th item
4 2 4
Q1 = value of 5.25th item Q2 = value of 10.5th item Q3 = value of 15.75th item
Q1 = 36 + 0.25 (38 – 36) Q2 = 54 + 0.5 (55 – 54) Q3 = 68 + 0.75 (70 – 68)
Q1 = 36 + 0.25 (2) = 36 + 0.5 Q2 = 54 + 0.5 (1) = 54 + 0.5 Q3 = 68 + 0.75 (2) = 68 + 1.5
= 36.5 = 54.5 = 69.5
Exercises:
Exercises 9: The following data are marks obtained by 20 students in a test of
statistics. Determine the Q1, Q2, and Q3
53 74 82 42 39 20 81 68 58 28
67 54 93 70 30 55 36 38 29 61
Mark x - x̄ (x - x̄ )2 (x - x̄ )4
43 -18.9 357.21 127598.984
48 -13.9 193.21 37330.1041
50 -11.9 141.61 20053.3921
50 -11.9 141.61 20053.3921
52 -9.9 98.01 9605.9601 s = √ (2,235.8/20-1)
52 -9.9 98.01 9605.9601
56 -5.9 34.81 1211.7361 K= 20(20+1) . 474175.75 - 3(20-1)2
58 -3.9 15.21 231.3441 s = 10.85 (20-1)(20-2)(20-3) 13858.6 (20-2)(20-3)
59 -2.9 8.41 70.7281
60 -1.9 3.61 13.0321
62 0.1 0.01 0.0001 s4 = 13858.59 K= 420 . 34.22 - 3(361)
65 3.1 9.61 92.3521 5814 306
66 4.1 16.81 282.5761
68 6.1 37.21 1384.5841
70 8.1 65.61 4304.6721 K = (0.072) (34.22) - 1083
71 9.1 82.81 6857.4961 306
74 12.1 146.41 21435.8881
76 14.1 198.81 39525.4161 K = (0.072) (34.22) - 3.54
78 16.1 259.21 67189.8241
80 18.1 327.61 107328.312
2235.80
K = -1.07 Platykurtic
474175.754
∑(x - x̄ )2 ∑(x - x̄ )4
Boxplot
A boxplot can give you information regarding the shape,
variability, and center (or median) of a statistical data set. It is
particularly useful for displaying skewed data. Statistical data
also can be displayed with other charts and graphs.

A boxplot, sometimes called a box and whisker plot, is a


type of graph used to display patterns of quantitative data.
Boxplot
A boxplot splits the data set into quartiles. The body of
the boxplot consists of a "box" (hence, the name), which goes
from the first quartile (Q1) to the third quartile (Q3).

Within the box, a vertical line is drawn at the Q2, the


median of the data set. Two horizontal lines, called whiskers,
extend from the front and back of the box. The front whisker
goes from Q1 to the smallest non-outlier in the data set, and the
back whisker goes from Q3 to the largest non-outlier.
Boxplot
Boxplot

● ● ●

The dot inside the boxplot represents the mean.


Boxplot
Exercises 20: Consider the boxplot below.
True or False:
F 1. The distribution is normal
_____
_____
F 2. The distribution is skewed right.
T 3. The distribution is skewed to the left.
_____

T 4. The interquartile range is about 8.
_____
_____
F 5. The median is about 10.
T 6. The mean is about 14.
_____
T 7. Seventy five percent of the data is greater than 10.
_____
T 8. Twenty five percent of the data is greater than 18.
_____
F 9. Fifty percent of the data is less than 14.
_____
F
_____10. Fifty percent of the data are with values in between
10 to 15.
Boxplot
Exercises 21: A salesperson recorded the number of sales he made each month. In the
past 12 months, he sold the following numbers of computers. Make a
boxplot for the sales

51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13.
Boxplot
Exercises 21: A salesperson recorded the number of sales he made each month. In the
past 12 months, he sold the following numbers of computers. Make a
boxplot for the sales

51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13.

Array 6 7 13 17 20 25 39 41 43 49 51 62
Order 1 2 3 4 5 6 7 8 9 10 11 12

Q1 = value of (12 + 1)th item Q2 = value of 2(12 + 1)th item Q3 = value of 3(12 + 1)th item
4 4 4
Q1 = value of 3.25th item Q2 = value of 6.5th item Q3 = value of 9.75th item
Q1 = 13 + 0.25 (17 – 13) Q2 = 25 + 0.5 (39 – 25) Q3 = 43 + 0.75 (49 – 43)
Q1 = 13 + 0.25 (4) = 13 + 1 Q2 = 25 + 0.5 (14) = 25 + 7 Q3 = 43 + 0.75 (6) = 43 + 4.5
= 14 = 32 = 47.5
Boxplot
Exercises 21: A salesperson recorded the number of sales he made each month. In the
past 12 months, he sold the following numbers of computers. Make a
boxplot for the sales

51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13.

Array 6 7 13 17 20 25 39 41 43 49 51 62
Order 1 2 3 4 5 6 7 8 9 10 11 12
Boxplot
Exercises 21: A salesperson recorded the number of sales he made each month. In the
past 12 months, he sold the following numbers of computers. Make a
boxplot for the sales

51, 17, 25, 39, 7, 49, 62, 41, 20, 6, 43, 13.

Array 6 7 13 17 20 25 39 41 43 49 51 62
Order 1 2 3 4 5 6 7 8 9 10 11 12

Mean

x̅ = 373 = 31.08
12
Quiz
Consider the data of the scores in an assessment. (a) Determine the
skewness and kurtosis, (b) Make a boxplot, and (c) Does the data
follow a normal distribution? Defend your answer.

20 28 29 30 36 37 39 42 53 55 55 58 61 67 68 70 74 81 82
93

You might also like