You are on page 1of 9

DESCRIPTION OF DATA

Data can be described using measures of central tendency and measures of variation.
Measures of central tendency
Mode and Modal Class: This is the observation and class interval respectively, with the highest
frequency. In our example A, the modal class is 13.25 – 13.29 or 13.245 – 13.295 which has the
highest frequency of 7. Graphically we use the tallest bar in a histogram to get the mode as
follows.

For grouped data mode is given by:


f m −f 1
Mode=L+ ×i
( f m−f 1 )+( f m−f 2 )

where L is the lower class boundary of the modal class, fm is the frequency of the modal class, f1
is the frequency of the class preceding the modal class, f2 is the class succeeding the modal class
and i is size of the modal class given by the difference between the upper and lower class
boundaries.

(7−2)
Mode=13 .245+ ×0 . 05=13. 29
In our example A, (7−2 )+(7−6)

Median:
This divides the data into two equal halves when the ungrouped data is arranged in ascending
order. E.g. 2, 5, 6, 7 median = 5.5; 2, 5, 6, 7, 8 median = 6
For the grouped data the median is given by:
n
−c
2
M =L M + ×i M
fM

where LM is the lower class boundary of the class containing


the median, n is the total frequency, c is cumulative frequency above the class containing the
median, f M is the frequency of the class containing the median and iM is the size of the class
containing the median given by the difference between upper and lower class boundaries.
Using example A, the Length (m) Frequency cumulative frequency

median occupies the 13.195 ̶ 13.245 2 2

21/2 = 10.5th position 13.245 ̶ 13.295 7 9

13.295 ̶ 13.345 6 15

13.345 ̶ 13.395 2 17

13.395 ̶ 13.445 2 19

13.445 ̶ 13.495 1 20

13.495 ̶ 13.545 1 21

10. 5−9
M=13 .295+ ×0. 05=13 .31
6
Mean

x̄=
∑x
For ungrouped data mean is given by: n

For grouped data (example A), mean is given by:

x=
∑ fx
∑f
2
Length (m) Frequency (f) x (class center) fx fx
13.20 ̶ 13.24 2 13.22 26.44 349.5368
13.25 ̶ 13.29 7 13.27 92.89 1232.6503

Measures of Dispersion:
Range
In statistics and mathematics, the range is the difference between the maximum and minimum
values of a data set and serves as one of two important features of a data set. The formula for a
range is the maximum value minus the minimum value in the dataset, which provides
statisticians with a better understanding of how varied the data set is.

Range=x max −x min


In the case of grouped data, the range is the difference between the upper boundary of the highest
class and the lower boundary of the lowest class. It is also calculated by using the difference
between the mid points of the highest class and the lowest class
Coefficient of Range
This is a relative measure of dispersion and is based on the value of the range. It is also called
range coefficient of dispersion. It is defined as:

x max −x min
x max +x min
Coefficient of Range =

Let us take two sets of observations. Set A contains the marks of five students in mathematics
out of 25 marks and group B contains marks of the same students in English out of 100 marks.

Set A:   10, 15, 18, 20, 20


Set B:   30, 35, 40, 45, 50
The values of the ranges and coefficients of range are calculated as:

Range Coefficient of Range

20−10
Set A: (Mathematics) 20–10=10 =0 . 33
20+10
50−30
Set B: (English) 50–30=20 =0 . 25
50+30
In set A the range is 10 and in set B the range is 20. Apparently it seems there is greater
dispersion in set B, but this is not true. The range of 20 in set B is for more observations and the
range of 10 in set A is for fewer observations. Thus 20 and 10 cannot be compared directly.
Their base is not the same. The marks in mathematics are out of 25 and the marks of English are
out of 100. Thus, it makes no sense to compare 10 with 20. When we convert these two values
into coefficients of range, we see that the coefficient of range for set A is greater than that of set
B. Thus there is greater dispersion or variation in set A. The marks of students in English are
more stable than their marks in mathematics.

 Example:

The following distribution gives the number of houses and the number of persons per house.

Number of Persons 1 2 3 4 5 6 7 8 9 10
Number of Houses 26 113 120 95 60 42 21 14 5 4
Calculate the range and coefficient of range.

Solution: Range = 10 – 1 = 9
Coefficient of range = 9/11 = 0.818
Example:

Find the range of the weight of the students of a university.

Weight (Kg) 60–62 63–65 66–68 69–71 72–74


Number of Students 5 18 42 27 8
Calculate the range and coefficient of range.

 Solution:
           

Weight (Kg) Class Boundaries Mid Value No. of Students


60–62 59.5–62.5 61 5
63–65 62.5–65.5 64 18
66–68 65.5–68.5 67 42
69–71 68.5–71.5 70 27
72–74 71.5–74.5 73 8
Method 1:
   xmax =  the upper class boundary of the highest class = 74.5

 xmin = the lower class boundary of the lowest class = 59.5

Range = 15 Kilograms

Coefficient of Range = 0.1119

Method 2:
xmax =  the mid value of the highest class = 73
xmin = the mid value of the lowest class =61
Range = 12
Coefficient of Range = 0.0895

Limitations of Range

The range is a very crude measurement of the spread of data because it is extremely sensitive to
outliers, and as a result, there are certain limitations to the utility of a true range of a data set to
statisticians because a single data value can greatly affect the value of the range.

For example, consider the set of data 1, 2, 3, 4, 6, 7, 7, 8. The maximum value is 8, the minimum
is 1 and the range is 7. Then consider the same set of data, only with the value 100 included. The
range now becomes 100-1 = 99 wherein the addition of a single extra data point greatly affected
the value of the range.

The range also tells us nothing about the internal features of our data set. For example, we
consider the data set 1, 1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 10 where the range for this data set is 10-1 =
9. If we compare this to the data set 1, 1, 1, 2, 9, 9, 9, 10. Here the range is, yet again, 9;
however, for this second set and unlike the first set, the data is clustered around the minimum
and maximum. Other statistics, such as the first and third quartile, would need to be used to
detect some of this internal structure.

Applications of Range

The range is a good way to get a very basic understanding of how spread out numbers in the data
set really are because it is easy to calculate as it only requires a basic arithmetic operation, but
there are also a few other applications of the range of a data set in statistics.

Quartile Deviation

This is obtained after the data is arranged in ascending order from the lowest to the highest.

Quartile deviation is based on the lower quartile Q1 and the upper quartile Q3. The difference Q3–
Q1 is called the inter quartile range. Quartile Deviation is given by:
Q3 −Q1
Q.D = 2

The quartile deviation is a slightly better measure of absolute dispersion than the range, but it
ignores the observations on the tails. If we take difference samples from a population and
calculate their quartile deviations, their values are quite likely to be sufficiently different. This is
called sampling fluctuation, and it is not a popular measure of dispersion. The quartile deviation
calculated from the sample data does not help us to draw any conclusion (inference) about the
quartile deviation in the population.

 Coefficient of Quartile Deviation

A relative measure of dispersion based on the quartile deviation is called the coefficient of
quartile deviation. It is defined as:

Q 3 −Q 1
Coefficient of Quartile Deviation = Q 3 +Q 1
It is a pure number free of any units of measurement. It can be used for comparing the dispersion
of two or more sets of data.

 Example (Ungrouped data)


The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040, 1080, 1200, 1440,
1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600, 1470, 1750, and 1885. Find the
quartile deviation and coefficient of quartile deviation.

 Solution:

After arranging the observations in ascending order, we get:

1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750,
1755, 1785, 1880, 1885, 1960.

Q1=Value of  4( )
n+1
th
item = Value of
20+1
4
th( )
item

= Value of 5.25th  item


= 5th  item+0.25(6th  item–5th  item)=1240+0.25(1320–1240)
= 1240+20=1260

3 (n+1) 3 (20+1)
th th
Q3=Value of  4 item = Value of 4 item
= Value of 15.75th  item
= 15th item+0.75(16th item–15th item)=1750+0.75(1755–1750)
=1750+3.75=1753.75

Q.D = 246.875
Coefficient of Quartile Deviation = 0.164
 

Example:(Grouped data)
Calculate the quartile deviation and coefficient of quartile deviation from the data given below:

Maximum Load
Number of Cables
(short-tons)
9.3–9.7 2
9.8–10.2 5
10.3–10.7 12
10.8–11.2 17
11.3–11.7 14
11.8–12.2 6
12.3–12.7 3
12.8–13.2 1
 Solution:

The necessary calculations are given below:

Maximum Load Number of Cables Class Cumulative


(short-tons) (f) Boundaries Frequencies
9.3–9.7 2 9.25–9.75 2
9.8–10.2 5 9.75–10.25 2+5=7
10.3–10.7 12 10.25–10.75 7+12=19
10.8–11.2 17 10.75–11.25 19+17=36
11.3–11.7 14 11.25–11.75 36+14=50
11.8–12.2 6 11.75–12.25 50+6=56
12.3–12.7 3 12.25–12.75 56+3=59
12.8–13.2 1 12.75–13.25 59+1=60

Q =Value of 
1
( n
4 ) th
item = Value of 15 item th

Q1 lies in the class 10.25–10.75

Q1 = LQ +
( n
4
−c )
¿i 15−7
1 f Q1 Q1 10 .25+ ×0 .5=10 .58
= 12
   

Q =Value of
3
( 3n
4 ) th
item = Value of the 45 item th

Q3 lies in the class 11.25–11.75

Q 3 =LQ
(+ 34n −c) ¿i 45−36
3 f Q3 Q3 11. 25+ ×0 . 5=11 .57
= 14

Q.D = 0.495
Coefficient of Quartile Deviation = 0.045
Graphical method

The orgive can be used to find the median, quartiles and the quartile deviation.

Variance:

The term variance refers to a statistical measurement of the spread between numbers in a data
set. More specifically, variance measures how far each number in the set is from the mean and
thus from every other number in the set.

For ungrouped data it is given by:

( )
2

σ=
∑2( x− x̄ )2 ∑ x 2 ∑ x
= −
n n n

For grouped data (example A) it is given by:

( )
2

σ 2
=
∑ f ( x −x )2 = ∑ fx 2 − ∑ fx
∑f ∑f ∑f
Standard Deviation: This is given by σ =√ var iance
The standard deviation is the average amount of variability in your dataset. It tells you, on
average, how far each value lies from the mean.

A high standard deviation means that values are generally far from the mean, while a low
standard deviation indicates that values are clustered close to the mean.

ASSIGNMENT
1. The marks obtained by 26 students in a statistics test are given below.

Marks 34 35 36 37 38 39 4 41 43
Calculate: 0
No. of students 3 5 2 3 3 2 3 4 1
(i) The median
(ii) The mean
(iii) The standard deviation
2. The following data show the speed of 30 cars on a highway (in km/h):
110 118 85 92 93 105 112 100 99 106 87 81 86 84 122 117 102 109 120 121 104 81
83 100 95 93 110 94 97 105
Use the frequency distribution obtained earlier (lecture notes 2) to:
(a) Calculate the median, mean, standard deviation, coefficient of quartile deviation and the
range.
(b) Graphically determine the mode, median and quartile deviation.

You might also like