You are on page 1of 23

Classification of Data

• Grouping of related facts into classes.


Classification condenses the mass of data so
that similarities and dissimilarities could be
readily seen and it helps in comparison.
Types of Classification of data
• Geographical classification
• Chronological classification
• Qualitative classification
• Quantitative classification
Geographical classification
• Data classified on the basis of geographical
differences or locational differences.
• Literacy rates in states
Sl. No. State Literacy Rate
1 Kerala 89.81
2 Goa 75.51
3 Tamilnadu 62.66
4 Karnataka 56.04
5 Andhra Pradesh 44.09
Chronological Classification
• Data are observed and classified over a period
of time
• Population of India
Census year Population (in millions)
1961 439
1971 548
1981 683
1991 847
2001 1027
Qualitative Classification
• Data are classified on the basis of attributes
• Simple classification: Only one attribute is
considered for classification
• Dichotomous classification: only two classes
are formed
• Manifold classification: Two or more attributes
are considered for classification.
Quantitative Classification
• Refers to classification of data according to
variables. Ex: height, weight, income,
expenditure, demand, sales etc.
Marks (%) No. of Students
0 – 20 4
20 – 40 6
40 – 60 28
60 – 80 25
80 - 100 7
Total 70
Here marks refers to the variable and no. of
students refers to the frequency. This is an
example of a frequency distribution where data
is classified on the basis of some variable that
can be measured. The term variable refers to
the characteristic that varies in amount or
magnitude in a frequency distribution. A
variable may be either continuous or discrete.
• A continuous variable is capable of taking
fractional values and hence can be said that
they are obtained by measurement. Above is
an example of a continuous frequency
distribution.
• A discrete variable is capable of taking whole
numbers only ( 1,2,3, etc.) and cannot take
fractional values. Hence it can be said that
they are obtained by counting.
An example

No of Goals scored No. of matches


in a hockey match
0 10
1 50
2 35
3 5
Total 100
Classification according to class
intervals.
i. Class limits: the class limits are the lowest and the highest values
that can be included in the class. For example, take the class 20 – 40.
The lowest value of this class is 20 and the highest 40. The two
boundaries of a class are known as the lower limit and the upper limit
of the class.
ii. Class intervals: the span of a class, is the difference between the
upper limit and the lower limit is known as class interval. In the class
20 – 40, the class interval is 20 i.e. 40 – 20 = 20.
iii. Class frequency: the number of observations corresponding to a
particular class is known as the frequency of that class or the class
frequency.
iv. Class mid-point: it is the value lying half – way between the lower
and upper class limits of a class interval. Midpoint of a class obtained
by taking the average of the upper and lower limits of the class.
Methods of classifying data according to class
intervals, namely i) exclusive method and
ii) inclusive method.
i) Exclusive method: When the class intervals are
so fixed that the upper limit and lower limit of
the next class is the same, it is known as the
‘exclusive’ method of classification.
Marks (%) No. of Students
0 – 20 4
20 – 40 6
40 – 60 28
60 – 80 25
80 - 100 7
ii) Inclusive Method: Under the inclusive
method of classification, the upper limit of
one class is included in that class itself.
Income (Rs ) No. of employees
1000 – 1099 20
1100 – 1199 39
1200 – 1299 42
1300 - 1399 19
Conversion: Inclusive to Exclusive CI.
Correction factor = (lower limit of the 2nd class – upper
limit of the 1st class )/2. This correction factor is
subtracted from all the lower limit and added to all the
upper limits. Hence for the above example, correction
factor = (1100 – 1099 )/ 2 = 0.5.
Income (Rs ) No. of employees
999.5 – 1099.5 20
1099.5 – 1199.5 39
1199.5 – 1299.5 42
1299.5 – 1399.5 19
Principles of classification
1. The number of classes should preferably be between 5 and
15. However there is no rigidity about it. The classes can be
more than 15 depending upon the total number of items in
the series and the details required, but they should not be less
than five because in that case the classification may not reveal
the essential characteristics.
2. As far as possible one should avoid class intervals such as 3,
7, 11, 26 etc and should have class intervals of either five or
multiples of 5. Ex: 10, 20, 25, 50, 100, 1000 etc . However, if
the data necessitates a class interval of less than 5, it can be
any value between 1 and 4.
3. The lower limit of the first class, should be either zero or 5
or multiples of 5. For example, if the lowest value in the data
is 63, then the first class should be 60-70, instead of 63-73.
Principles of classification
4. The intervals should be equal for all the classes. If intervals are not
of uniform width, it is difficult to make meaningful comparison
between classes. Also unequal class intervals present problems when
graphing and computing certain averages and other statistical
measures. However, frequency distributions with unequal class
intervals are desirable when there are large gaps in the data.
5. If possible , open end classes of the type, below 1000, 1000-2000,
2000-3000, 3000-4000, above 4000, should be avoided. Open-end
distribution presents problems of graphing and further analysis.
6. In any frequency distribution, values of the variables which is under
study is presented in the left-hand side whereas the number of items
falling in each class, the frequency is presented at the right hand side.
Cumulative Frequencies
Helps us to know how Marks (%) No. of
many observations are Students
below an upper limit of 0 – 20 4
the class and above a
20 – 40 6
lower limit of the class.
40 – 60 28
Hence we have “less than”
and “more than” 60 – 80 25
cumulative frequency. 80 - 100 7
Less than cumulative frequency
• Takes into account the Marks No. of Less c.f.
(%) Student than or
upper limits of the class s
equal to
intervals.
0 – 20 4 20 4
• Hence we have <20, <
40, < 60 , < 80 and < 20 – 40 6 40 10
(i.e.6+4)
100. 40 – 60 28 60 38
(i.e.10+
28)
60 – 80 25 80 63

80 - 100 7 100 70
More than cumulative frequency
Takes into account the Marks No. of More c.f.
(%) Student than
lower limits of the class s
interval.
Hence we have ≥ 0, ≥20, 0 – 20 4 0 70
≥40, ≥60, ≥80 and the
20 – 40 6 20 66 (i.e. 70-
cumulative frequency 4)
40 – 60 28 40 60 (i.e. 66-
6)
60 – 80 25 60 32

80 - 100 7 80 7
l.c.f. and m.c.f
Marks (%) No. of Less than or c.f. Students More than c.f. students
Students equal to (%) (%)
0 – 20 4 20 4 0 70
20 – 40 6 40 10 20 66
40 – 60 28 60 38 40 60
60 – 80 25 80 63 60 32
80 - 100 7 100 70 80 7

You might also like