You are on page 1of 10

Descriptive Statistics: Examination June 2018

The IQ scores of 20 learners randomly selected from Open University are given
below:

69 70 93 53 92 85 75 70 68 88

76 70 77 85 82 82 80 96 99 83

a) Draw a stem and leaf diagram to display these data.

b) Compute the median, mean, standard deviation, variance, coefficient of

variation and interquartile range.

c) Using graph paper, draw a box and whisker diagram to represent the

above data. Define clearly all parameters. Comment on the shape of the

distribution.

d) Using the findings obtained at (b), state, with reasons, which measure

of central tendency you consider more appropriate for summarizing

these data.

SOLUTION 1:

Procedure to construct a Stem and Leaf

It represents data by separating each data value into two parts: the stem (such as the
leftmost digit) and the leaf (such as the rightmost digit)

Steps

First, we spot the number of observations which is equal to 20. (Or n = 20)

Step 1: From the above raw data, find the smallest number and the highest number in
the data set.

The smallest number is 53

The highest number is 99


Step 2: Draw a vertical line and write the digits in the tens places from 5 to 9 on the
left of the line. The tens digit form the stems.

Stem
5
6
7
8
9

Note: We start from 5 because our smallest number is 53 and end at 9 as our highest
number is 99.

Step 3: Write the units digit to the right of the line. The units digit form the leaves.

Stem Leaf Count


5 3 1
6 9 8 2
7 0 5 0 6 0 7 6
8 5 8 5 2 2 0 3 7
9 3 2 6 9 4
20 Total Observations

Note: the count is done to avoid any missing value/data.

Step 4: Arrange the leaves in order.

Stem Leaf Count


5 3 1
6 8 9 2
7 0 0 0 5 6 7 6
8 0 2 2 3 5 5 8 7
9 2 3 6 9 4
20 Total Observations

(a) Stem and Leaf

Stem Leaf
5 3
6 8 9
7 0 0 0 5 6 7
8 0 2 2 3 5 5 8
9 2 3 6 9

Key: 5 | 3 = 53.

For further explanation on stem and leaf, please follow the link:
https://youtu.be/MUCvUgGfzdo
(b) Descriptive Statistics

𝒏+𝟏 𝒕𝒉 𝟐𝟎+𝟏 𝒕𝒉 𝟐𝟏 𝒕𝒉
Median = ( ) 𝒕𝒆𝒓𝒎 = ( ) 𝒕𝒆𝒓𝒎 = ( 𝟐 ) 𝒕𝒆𝒓𝒎 = 𝟏𝟎. 𝟓𝒕𝒉 𝒕𝒆𝒓𝒎
𝟐 𝟐

𝟏𝟎𝒕𝒉+𝟏𝟏𝒕𝒉 𝟖𝟎+𝟖𝟐 𝟏𝟔𝟐


Median = 𝟏𝟎. 𝟓𝒕𝒉 𝒕𝒆𝒓𝒎 = ( )=( )=( ) = 𝟖𝟏
𝟐 𝟐 𝟐

10th term = from stem and leaf we get 80.

11th term = from stem and leaf we get 82.

𝒏+𝟏 𝒕𝒉 𝟐𝟎+𝟏 𝒕𝒉 𝟐𝟏 𝒕𝒉
Lower Quartile = ( ) 𝒕𝒆𝒓𝒎 = ( ) 𝒕𝒆𝒓𝒎 = ( 𝟒 ) 𝒕𝒆𝒓𝒎 = 𝟓. 𝟐𝟓𝒕𝒉 𝒕𝒆𝒓𝒎
𝟒 𝟒

𝟓𝒕𝒉+𝟔𝒕𝒉 𝟕𝟎+𝟕𝟎 𝟏𝟒𝟎


Lower Quartile = 𝟓. 𝟐𝟓𝒕𝒉 𝒕𝒆𝒓𝒎 = ( )=( )=( ) = 𝟕𝟎
𝟐 𝟐 𝟐

5th term = from stem and leaf we get 70.

6th term = from stem and leaf we get 70.

𝒏+𝟏 𝒕𝒉 𝟐𝟎+𝟏 𝒕𝒉 𝟐𝟏 𝒕𝒉
Upper Quartile = 𝟑( ) 𝒕𝒆𝒓𝒎 = 𝟑( ) 𝒕𝒆𝒓𝒎 = 𝟑 ( 𝟒 ) 𝒕𝒆𝒓𝒎 =
𝟒 𝟒
𝒕𝒉
𝟏𝟓. 𝟕𝟓 𝒕𝒆𝒓𝒎
𝟏𝟓𝒕𝒉+𝟏𝟔𝒕𝒉 𝟖𝟓+𝟖𝟖 𝟏𝟕𝟑
Upper Quartile = 𝟏𝟓. 𝟕𝟓𝒕𝒉 𝒕𝒆𝒓𝒎 = ( )=( )=( ) = 𝟖𝟔. 𝟓
𝟐 𝟐 𝟐

15th term = from stem and leaf we get 85.

16th term = from stem and leaf we get 88.

Interquartile Range = Upper Quartile – Lower Quartile = 86.5 – 70 = 16.5

For further explanation on median, lower quartile, upper quartile, and


interquartile range, please follow the link: https://youtu.be/8ooKmIIb0Yo
∑𝑥
Mean =
𝑛
69 + 70 + 93 + 53 + 92 + 85 + 75 + 70 + 68 + 88 + 76 + 70 + 77 + 85 + 82 + 82 + 80 + 96 + 99 + 83
=
20

1593
= = 𝟕𝟗. 𝟔𝟓
20

∑ 𝒙 = 𝑻𝒐𝒕𝒂𝒍 𝒙

𝒏 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 = 𝟐𝟎

𝑻𝒐𝒕𝒂𝒍 𝒙
Mean =
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔

Mean = Average

Mean = Mode = Median = central tendency = central location.


∑ 𝑥2 ∑𝑥 2
Standard Deviation = √ − ( )
𝑛 𝑛

Workings

69 4761
70 4900
93 8649
53 2809
92 8464
85 7225
75 5625
70 4900
68 4624
88 7744
76 5776
70 4900
77 5929
85 7225
82 6724
82 6724
80 6400
96 9216
99 9801
83 6889
Total 1,593 129,285

2
Standard Deviation = √129,285 − (1593)
20 20

= √6,464.25 − (79.65)2 = √120.1275 = 𝟏𝟎. 𝟗𝟔

Variance = (𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)2 = (10.96027)2 = 𝟏𝟐𝟎. 𝟏𝟐𝟕

𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
Coefficient of Variation = 𝐱 𝟏𝟎𝟎%
𝒎𝒆𝒂𝒏

10.96
= x 100% = 𝟏𝟑. 𝟕𝟔%
79.65
To draw a Box and Whisker plot

It provides a reasonable visual of a data set's skewness, symmetry, and spread.

We need to compute the following 5 parameters (already done):

Minimum 53.00
Maximum 99.00
Lower Quartile 70.00
Upper Quartile 86.50
Median 81.00

Example 1:

Example 2:

The following are the five measures shown in the above chart:
1. The Lower Extreme: This is the lowest data point in the set.
2. The Lower Quartile: This is the median of the lower half of the data.
3. The Median: This is the middle value of the data set. In other words, it is
the value with the same number of terms below and above it that splits the
set in half.
4. The Upper Quartile: This is the median of the upper half of the data.
5. The Upper Extreme: This is the greatest value in the data set.

Types of Box and Whisker

Note: Normal Distribution = Symmetrical

(3) Box and Whisker Plot: To plot on Graph Paper

Comment: The shape of the distribution is negatively skewed. (compare with the
types of box and whisker as shown above)
(4) The mean is the best average as there is no outlier in the data.

Example:

Condition to select mean or median as the best central location

 If there is outlier, we choose Median as the best central location.


 No outlier, we select Mean as the best central location.

We need to justify the statement below.

“The mean is the best average as there is no outlier in the data.”

Lower Fence = Lower Quartile – (1.5 x Interquartile Range)

= 70 – (1.5 x 16.50) = 45.25 (we know that the smallest value = 53)

Upper Fence = Upper Quartile + (1.5 x Interquartile Range)

= 86.50 + (1.5 x 16.50) = 111.25 (we know that the highest value = 99)

Therefore the lower fence is less than smallest value, it means that there is no
outlier in the dataset, that is, 45.25 < 53.

Therefore the upper fence is more than highest value, it means that there is no
outlier in the dataset, that is, 111.25 > 99.

In this case, there is no outlier, we select Mean as the best central location.

Note: Outliers, which are data values that are far away from other data values,
can strongly affect the results of your analysis. The presence of outliers can very
problematic in data analysis.
Practical Exercises

Exercise 1

ABC Company Ltd buys electrical components in batches. From time to time a
batch is randomly selected and, for quality control purposes, all the components
are inspected. The data gives the number of defective components found in 25
batches recently bought.

63 27 46 47 22

64 30 19 69 36

65 60 40 66 55

33 47 42 49 23

22 46 62 30 20

1. Draw a stem and leaf diagram to display these data.


2. Compute the median, mean, standard deviation, coefficient of variation and
inter-quartile range.
3. Draw a box and whisker. Define clearly all parameters. Briefly discuss
whether the data appear to be symmetrical or skewed.
4. State, with reasons, which measure of central location you consider more
appropriate for summarizing these data. Interpret the results for this
measure and for the corresponding measure of variability (or spread).
Exercise 2

The data below gives the number of defective electrical components found in 40
batches produced by Expert Company Ltd.

12 16 81 49 60 17 19 48

34 20 25 50 32 72 57 44

76 62 93 43 47 93 86 71

54 66 48 51 27 22 16 53

48 61 33 19 78 49 98 19

1. Draw a stem and leaf diagram to display these data.


2. Compute the median, mean, standard deviation, coefficient of variation and
inter-quartile range.
3. Draw a box and whisker. Define clearly all parameters. Briefly discuss
whether the data appear to be symmetrical or skewed.
4. State, with reasons, which measure of central location you consider more
appropriate for summarizing these data. Interpret the results for this
measure and for the corresponding measure of variability (or spread).

For any queries on the above detailed solutions and given exercises, we shall discuss
on Group WhatsApp and Txt Message.

Next Chapter: Histogram for Unequal Class Interval + Cumulative Frequency

You might also like