Professional Documents
Culture Documents
Data can be described using measures of central tendency and measures of variation.
Measures of central tendency
Mode and Modal Class: This is the observation and class interval respectively, with the highest
frequency. In our example A, the modal class is 13.25 – 13.29 or 13.245 – 13.295 which has the
highest frequency of 7. Graphically we use the tallest bar in a histogram to get the mode as
follows.
where L is the lower class boundary of the modal class, fm is the frequency of the modal class, f1
is the frequency of the class preceding the modal class, f2 is the class succeeding the modal class
and i is size of the modal class given by the difference between the upper and lower class
boundaries.
(7−2)
Mode=13 .245+ ×0 . 05=13. 29
In our example A, (7−2 )+(7−6)
Median:
This divides the data into two equal halves when the ungrouped data is arranged in ascending
order. E.g. 2, 5, 6, 7 median = 5.5; 2, 5, 6, 7, 8 median = 6
For the grouped data the median is given by:
n
−c
2
M =L M + ×i M
fM
13.295 ̶ 13.345 6 15
13.345 ̶ 13.395 2 17
13.395 ̶ 13.445 2 19
13.445 ̶ 13.495 1 20
13.495 ̶ 13.545 1 21
10. 5−9
M=13 .295+ ×0. 05=13 .31
6
Mean
x̄=
∑x
For ungrouped data mean is given by: n
x=
∑ fx
∑f
2
Length (m) Frequency (f) x (class center) fx fx
13.20 ̶ 13.24 2 13.22 26.44 349.5368
13.25 ̶ 13.29 7 13.27 92.89 1232.6503
Measures of Dispersion:
Range
In statistics and mathematics, the range is the difference between the maximum and minimum
values of a data set and serves as one of two important features of a data set. The formula for a
range is the maximum value minus the minimum value in the dataset, which provides
statisticians with a better understanding of how varied the data set is.
x max −x min
x max +x min
Coefficient of Range =
Let us take two sets of observations. Set A contains the marks of five students in mathematics
out of 25 marks and group B contains marks of the same students in English out of 100 marks.
20−10
Set A: (Mathematics) 20–10=10 =0 . 33
20+10
50−30
Set B: (English) 50–30=20 =0 . 25
50+30
In set A the range is 10 and in set B the range is 20. Apparently it seems there is greater
dispersion in set B, but this is not true. The range of 20 in set B is for more observations and the
range of 10 in set A is for fewer observations. Thus 20 and 10 cannot be compared directly.
Their base is not the same. The marks in mathematics are out of 25 and the marks of English are
out of 100. Thus, it makes no sense to compare 10 with 20. When we convert these two values
into coefficients of range, we see that the coefficient of range for set A is greater than that of set
B. Thus there is greater dispersion or variation in set A. The marks of students in English are
more stable than their marks in mathematics.
Example:
The following distribution gives the number of houses and the number of persons per house.
Number of Persons 1 2 3 4 5 6 7 8 9 10
Number of Houses 26 113 120 95 60 42 21 14 5 4
Calculate the range and coefficient of range.
Solution: Range = 10 – 1 = 9
Coefficient of range = 9/11 = 0.818
Example:
Solution:
Range = 15 Kilograms
Method 2:
xmax = the mid value of the highest class = 73
xmin = the mid value of the lowest class =61
Range = 12
Coefficient of Range = 0.0895
Limitations of Range
The range is a very crude measurement of the spread of data because it is extremely sensitive to
outliers, and as a result, there are certain limitations to the utility of a true range of a data set to
statisticians because a single data value can greatly affect the value of the range.
For example, consider the set of data 1, 2, 3, 4, 6, 7, 7, 8. The maximum value is 8, the minimum
is 1 and the range is 7. Then consider the same set of data, only with the value 100 included. The
range now becomes 100-1 = 99 wherein the addition of a single extra data point greatly affected
the value of the range.
The range also tells us nothing about the internal features of our data set. For example, we
consider the data set 1, 1, 2, 3, 4, 5, 5, 6, 7, 8, 8, 10 where the range for this data set is 10-1 =
9. If we compare this to the data set 1, 1, 1, 2, 9, 9, 9, 10. Here the range is, yet again, 9;
however, for this second set and unlike the first set, the data is clustered around the minimum
and maximum. Other statistics, such as the first and third quartile, would need to be used to
detect some of this internal structure.
Applications of Range
The range is a good way to get a very basic understanding of how spread out numbers in the data
set really are because it is easy to calculate as it only requires a basic arithmetic operation, but
there are also a few other applications of the range of a data set in statistics.
Quartile Deviation
This is obtained after the data is arranged in ascending order from the lowest to the highest.
Quartile deviation is based on the lower quartile Q1 and the upper quartile Q3. The difference Q3–
Q1 is called the inter quartile range. Quartile Deviation is given by:
Q3 −Q1
Q.D = 2
The quartile deviation is a slightly better measure of absolute dispersion than the range, but it
ignores the observations on the tails. If we take difference samples from a population and
calculate their quartile deviations, their values are quite likely to be sufficiently different. This is
called sampling fluctuation, and it is not a popular measure of dispersion. The quartile deviation
calculated from the sample data does not help us to draw any conclusion (inference) about the
quartile deviation in the population.
A relative measure of dispersion based on the quartile deviation is called the coefficient of
quartile deviation. It is defined as:
Q 3 −Q 1
Coefficient of Quartile Deviation = Q 3 +Q 1
It is a pure number free of any units of measurement. It can be used for comparing the dispersion
of two or more sets of data.
Solution:
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750,
1755, 1785, 1880, 1885, 1960.
Q1=Value of 4( )
n+1
th
item = Value of
20+1
4
th( )
item
3 (n+1) 3 (20+1)
th th
Q3=Value of 4 item = Value of 4 item
= Value of 15.75th item
= 15th item+0.75(16th item–15th item)=1750+0.75(1755–1750)
=1750+3.75=1753.75
Q.D = 246.875
Coefficient of Quartile Deviation = 0.164
Example:(Grouped data)
Calculate the quartile deviation and coefficient of quartile deviation from the data given below:
Maximum Load
Number of Cables
(short-tons)
9.3–9.7 2
9.8–10.2 5
10.3–10.7 12
10.8–11.2 17
11.3–11.7 14
11.8–12.2 6
12.3–12.7 3
12.8–13.2 1
Solution:
Q =Value of
1
( n
4 ) th
item = Value of 15 item th
Q1 = LQ +
( n
4
−c )
¿i 15−7
1 f Q1 Q1 10 .25+ ×0 .5=10 .58
= 12
Q =Value of
3
( 3n
4 ) th
item = Value of the 45 item th
Q 3 =LQ
(+ 34n −c) ¿i 45−36
3 f Q3 Q3 11. 25+ ×0 . 5=11 .57
= 14
Q.D = 0.495
Coefficient of Quartile Deviation = 0.045
Graphical method
The orgive can be used to find the median, quartiles and the quartile deviation.
Variance:
The term variance refers to a statistical measurement of the spread between numbers in a data
set. More specifically, variance measures how far each number in the set is from the mean and
thus from every other number in the set.
( )
2
σ=
∑2( x− x̄ )2 ∑ x 2 ∑ x
= −
n n n
( )
2
σ 2
=
∑ f ( x −x )2 = ∑ fx 2 − ∑ fx
∑f ∑f ∑f
Standard Deviation: This is given by σ =√ var iance
The standard deviation is the average amount of variability in your dataset. It tells you, on
average, how far each value lies from the mean.
A high standard deviation means that values are generally far from the mean, while a low
standard deviation indicates that values are clustered close to the mean.
ASSIGNMENT
1. The marks obtained by 26 students in a statistics test are given below.
Marks 34 35 36 37 38 39 4 41 43
Calculate: 0
No. of students 3 5 2 3 3 2 3 4 1
(i) The median
(ii) The mean
(iii) The standard deviation
2. The following data show the speed of 30 cars on a highway (in km/h):
110 118 85 92 93 105 112 100 99 106 87 81 86 84 122 117 102 109 120 121 104 81
83 100 95 93 110 94 97 105
Use the frequency distribution obtained earlier (lecture notes 2) to:
(a) Calculate the median, mean, standard deviation, coefficient of quartile deviation and the
range.
(b) Graphically determine the mode, median and quartile deviation.