You are on page 1of 52

Classification and Tabulation

of Data
• A statistical enquiry has four phases, viz.,
– Collection of data;
– Classification and tabulation of data;
– Analysis of data;
– Interpretation of data.
Classification and tabulation of data
• The collected data are always in un organized form and need to be
organized and presented in meaningful and comprehensible form
in order to facilitate further statistical analysis.
• The process of grouping into different classes or sub classes
according to some characteristics is known as classification.
• Tabulation is the systematic arrangement and presentation of
classified data

“the process of arranging things in groups or classes according to


their resemblances and affinities and gives expression to the unity of
attributes that may subsist amongst a diversity of individuals”.
Connor
Rules of classification
1. There should not be any ambiguity in the definition of
classes.
2. Eliminate all doubts while including a particular item in a
class.
3. All the classes should preferably have equal width or length.
Only in some special cases, we use classes of unequal width.
4. The class-limits (integral or fractional) should be selected in
such a way that no value of the item in the raw data
coincides with the value of the limit.
5. The number of classes should preferably be between 10 and
20, i.e., neither too large nor too small.
6. The classes should be exhaustive, i.e., each value of the raw
data should be included in them.
Rules of classification
6. The classes should be mutually exclusive and non-
overlapping, i.e., each item of the raw data should fit only in
one class.
7. The classification must be suitable for the object of inquiry.
8. The classification should be flexible and items included in
each class must be homogeneous.
9. Width of class-interval is determined by first fixing the no. of
class-intervals and then dividing the Range by that number.
Type of classification
1. Temporal or chronological classification
2. Geographical or Spatial classification
3. Qualitative classification
4. Quantitative classification
Temporal or chronological classification

• In chronological classification, the collected data are arranged


according to the order of time expressed in years, months,
weeks, etc.,
• The data is generally classified in ascending order of time.
– index numbers arranged over a period of time, population of a country
for several decades, exports and imports of India for different five
year plans, etc.
Geographical or Spatial classification
• In this type of classification the data are classified according to
geographical region or place.
• Classification id usually listed in alphabetical order for easy
reference
• Items may also be listed by size to emphasize the importance
(rank)
– e.g., production of paddy in different states in India, production of
wheat in different countries etc.
Qualitative classification
• Data is classified according to attributes or non-measurable
characteristics;
– sex, literacy, religion, marital status, employment nationality,
occupation, etc.
1. Dichotomous or simple classification
2. Manifold Classification
Dichotomous or simple classification
• When classification is done with respect to one attribute,
which is dichotomous in nature, two classes are formed, one
possessing the attribute and the other not possessing the
attribute. This type of classification is called simple or
dichotomous classification.

Sex
Manifold Classification
• The classification, where two or more attributes are
considered
– Classification of population simultaneously with respect to two attributes, e.g
sex and employment, then Population are classified into four classes namely:
Male employed , Male , unemployed, Female employed, Female unemployed
Quantitative classification
• Quantitative classification refers to the classification of data
according to some characteristics that can be measured in
numerical size
– Height, weight etc
• Here we classify the data by assigning arbitrary limits known as
class-limits.
• The quantitative phenomenon under study is called a variable.
– Population of the whole country may be classified according to
different variables like age, income, wage, price
• Hence this classification is often called ‘classification by
variables’.
Quantitative classification

Weight in Kg No. Of Animal


200-250 31
250-300 59
300-350 54
350 -400 37
400-450 19
Variable
• A variable is any measurable characteristic or quantity which
can assume a range of numerical values within certain limits,
e.g., income, height, age, weight, wage, price, etc.
• A variable can be classified as either discrete or continuous.
– Discrete variable: A variable which can take up only exact values
and not any fractional values, is called a ‘discrete’ variable.
• Number of workmen in a factory, members of a family, students in
a class, number of births in a certain year
– Continuous variable: A variable which can take up any
numerical value (integral/fractional) within a certain range is
called a ‘continuous’ variable.
• Height, weight, rainfall, time, temperature, etc.
STATISTICAL SERIES
(DISCRETE/CONTINUOUS)
• Corner defined statistical series as: “If two variable quantities
can be arranged side by side so that measurable difference in
the one corresponds with measurable difference in the other,
the result is said to form a statistical series”.
• Statistical series may be either discrete or continuous
FREQUENCY DISTRIBUTION
• If the value of a variable, e.g., height, weight, etc. (continuous),
number of students in a class, occurs twice or more in a given
series of observations, then the number of occurrence of the
value is termed as the “frequency” of that value.
• The way of tabulating a pool of data of a variable and their
respective frequencies side by side is called a ‘frequency
distribution’ of those data.
• Croxton and Cowden defined frequency distribution as “a
statistical table which shows the sets of all distinct values of the
variable arranged in order of magnitude, either individually or in
groups, with their corresponding frequencies side by side”.
FREQUENCY DISTRIBUTION
• Let us consider the marks obtained by 100 students of a class in a
subject.

• If the raw-data of above are arranged in either ascending,


or, descending order of magnitude, we get a better way of
presentation, usually called an “array”
FREQUENCY DISTRIBUTION
• Array of Marks
FREQUENCY DISTRIBUTION
• Data in the form of a simple (or, ungrouped) frequency distribution is presented
using the tally marks.
• A tally mark is an upward slanted stroke (/) which is put against a value each
time it occurs in the raw data. The fifth occurrence of the value is represented
by a cross tally mark (\) as across the first four tally marks.
• the tally marks are counted and the total of the tally marks against each value is
its frequency.
FREQUENCY DISTRIBUTION
Frequency distribution helps us
1. To analyse the data.
2. To estimate the frequencies of the population on the basis of
the ample.
3. To facilitate the computation of various statistical measures.
Grouped Frequency Distribution
• The data in last table can be further condensed by putting them into
smaller groups, or, classes called “class-Intervals”.
• Class Interval: A large number of observations varying in a wide
range are usually classified in several groups according to the size of
their values. Each of these groups is defined by an interval called
class interval.
• The number of items which fall in a class-interval is called its “class
frequency”.
• The tabulation of raw data by dividing the whole range of
observations into a number of classes and indicating the
corresponding class-frequencies against the class-intervals, is called
“grouped frequency distribution”

• and the nine class-intervals will be 56–57, 58–59, ..., etc.


Grouped Frequency Distribution

• steps in preparing the grouped frequency distribution are:


1. Determining the class intervals.
2. Recording the data using tally marks.
3. Finding frequency of each class by counting the tally marks.

We find that the lowest value is 56 and the highest value is 73. Thus for
approximately 10 classes, the difference of values between two
consecutive classes will be
Grouped Frequency Distribution

• Marks Obtained by 100 Students of a Class in Economics


Several Important Terms
• Class-limits: The maximum and minimum values of a class-interval are
called upper class limit and lower class-limit respectively.
• In last table the lower class-limits of nine classes are 56, 58, 60, 62, 64, 66,
68, 70, 72 and the upper class-limits are 57, 59, 61, 63, 65, 67, 69, 71, 73.
• Class-mark, or, Mid-value: The class-mark, or, mid-value of the class-
interval lies exactly at the middle of the class-interval and is given by:
Several Important Terms
• Class boundaries: Class boundaries are the true-limits of a class interval. It
is associated with grouped frequency distribution, where there is a gap
between the upper class-limit and the lower class-limit of the next class.
This can be determined by using the formula:

where d = common difference between the upper class-limit of a class-


interval and the lower class limit of the next higher class interval
• The class-boundaries of the class-intervals of Table 1.5 will be 55.5 – 57.5;
57.5 – 59.5; 59.5 – 61.5; etc., since d = 58 – 57 = 60 – 59 = ...= 1.
• The class-boundaries convert a grouped frequency distribution (inclusive
type) into a continuous frequency distribution.
Several Important Terms
• Type of class-interval
Several Important Terms
• Exclusive Class Interval
– In the exclusive type the class-limits are continuous, i.e., the upper-limit of one class-
interval is the lower limit of the next class-interval and class limits of a class-interval
coincide with the class boundaries of that class-interval.
– It is suitable for continuous variable data and facilitates mathematical computations.
– Class-intervals (A) like 10–15, 15–20, 50–100, 100–150; 30–, 40–; are upper limit exclusive
type, i.e., an item exactly equal to 15, 100 and 40 are put in the intervals 15–20, 100–150
and 40–, respectively and not in intervals 10–15, 50–100 and 30–, respectively.
– Similarly, 15 is included and 10 excluded (lower limit) in “above 10 but not more than 15”
class-interval.
Several Important Terms
• Inclusive Class-interval
– Class-intervals (B) like 60–69, 70–79, 80–89, etc., are inclusive type. Here both the upper
and lower class-limits are included in the class-intervals, e.g., 60 and 69 both are included
in the class-interval 60–69.
– This is suitable for discrete variable data. There is no ambiguity to which an item belongs
but the idea of continuity is lost.
– To make it continuous, it can be written as 59.5–69.5, 69.5–79.5, 79.5–89.5, etc.
• Open-end Class-interval
– In ‘open-end’ class-interval (C) either the lower limit of the first class- interval, or, upper
limit of the last class-interval, or, both are missing.
– It is difficult to determine the mid-values of the first and last class-intervals without an
assumption. If the other closed class-intervals have equal width, then we can assume that
the open-end class-intervals also have the same common width of the closed class-
intervals.
• Unequal Class Interval
– Unequal class-intervals (D) are preferred only when there is a great fluctuation in the
data.
Several Important Terms
• Width or Length (or size) of a Class-interval:
Width of a class interval = Upper class boundary - Lower class-boundary
= difference between two successive upper Class-
limits (or, two successive lower class-limits) (when
the class-intervals have equal widths)
= difference between two successive upper class-
boundaries (or, two successive lower class
boundaries)
= difference between two successive class marks,
or, mid value
Several Important Terms
Relative Frequency Distribution
• A relative frequency distribution is a distribution in which relative
frequencies are recorded against each class interval. Relative frequency of
a class is the frequency obtained by dividing frequency by the total
frequency. Relative frequency is the proportion of the total frequency that
is in any given class interval in the frequency distribution.
Relative Frequency Distribution Table 
If the frequency of the frequency distribution table is changed into relative frequency
then frequency distribution table is called as relative frequency distribution table. For
a data set consisting of n values. If f is the frequency of a particular value then the
ratio 'f/n' is called its relative frequency.
Relative Frequency Distribution
Relative Cumulative
Class interval Frequency (f)
Frequency (f/n) 
20-25 10 10 / 70 = 0.143
25-30 12 12 / 70 = 0.171
30-35 8 8 / 70 = 0.114
35-40 20 20 / 70 = 0.286
40-45 11 11 / 70 = 0.157
45-50 4 4 / 7 0 = 0.057
50-55 5 5 / 70 = 0.071
Total n = 70
Two-way, or, Bivariate Frequency
Distribution
• Often we find data composed of measurements made on two, or,
more variable for each item, e.g., weights and heights of a group of
boys, ages of wives and husbands for a group of couples, etc.
• Such data require classification w.r. to characteristics, or, criteria
simultaneously.
• The frequency distribution obtained by this cross classification is
called the bivariate frequency distribution.
Steps of Construction of a Bivariate Table
1. Find the class-interval of each of the variables.
2. Write one of the variables on the left-hand corner of the table
and the other on top of the table.
3. The first item X = 12 (Table 1.6) falls in the class-interval. 10–20
and its Y value 140 falls in the class-interval 100–200. Insert a
tally mark in the 1st cell where the column corresponding to X
= 12 intersects the row corresponding to Y = 140.
4. Similarly, insert tally marks for each pair of values (X, Y) for all
the 25 sets. The total frequency for each cell is written within
brackets ( ) immediately after the tally marks.
5. Next count the frequencies of each row and put the final figure
at the extreme right column Again count the frequencies of
each column and put the final figure at the bottom row.
Two-way, or, Bivariate Frequency Distribution
• Example (a) 25 values of two variables X and Y are given below. Form a two-way
frequency table showing the relationship between the two. Take class-intervals of
X as 10 to 20; 20 to 30; etc., that of Y as 100 to 200; 200 to 300, etc.
Two-way, or, Bivariate Frequency Distribution
CUMULATIVE FREQUENCY DISTRIBUTION
• A frequency distribution becomes cumulative when the frequency of each class-
interval is cumulative.
• Cumulative frequency of a class-interval can be obtained by adding the frequency
of that class-interval to the sum of the frequencies of the preceding class-
intervals.
• Often we want to know the number of cases which fall below, or, above a certain
value.
• Hence, there are two types of cumulative frequencies, i.e.,
– (1) less than (or, from below) cumulative frequency, and
– (2) more than (or, from above) cumulative frequencies.
• In the less than type the cumulative frequency of each class-interval is obtained
by adding the frequencies of the given class and all the preceding classes, when
the classes are arranged in the ascending order of the value of the variable.
• In the more than type the cumulative frequency of each class-interval is obtained
by adding the frequencies of the given class and the succeeding classes.
• For grouped frequency distribution, the cumulative frequencies are shown
against the class-boundary points.
CUMULATIVE FREQUENCY DISTRIBUTION
Practical Exercise 1
Below is given the Gestation Period of crossbred cows maintained at IVRI farm,
Izatnagar . Construct frequency distribution Table of this data
278 273 285 282 274 271
289 287 270 277 285 280
254 283 275 286 276 270
285 293 283 287 286 278
272 292 281 285 290 278
261 282 282 255 283 284
283 309 281 288 288 274
283 285 281 272 277 283
273 285 270 292 253 280
295 269 288 281 287 277
290 271 292 264 285 265
276 275 308 273 266 266
278 299 278 286 280 279
279 280 284 285 270 269
272 287 275 276 281 283
276 278 279 270 276 266
275 288 291 274 279 278
283 272 285 282 260 275
282 285 257 266 286 281
270 294 279 269 299 298
Practical Exercise 2
Below is given the daily milk yield of crossbred cows maintained at IVRI farm, Izatnagar.
Tabulate the data in form of cummelative frequency distribution (less then and more
then form)
7.44 5.81 6.61 7.46 3.78 5.27
8.86 4.38 5.72 7.79 6.71 8.66
6.55 7.34 5.86 4.39 4.45 6.64
5.18 6.90 6.40 7.02 7.02 3.84
7.19 4.89 2.08 2.89 7.90 7.54
6.55 6.99 7.20 7.46 8.55 7.03
11.60 3.19 4.04 6.50 6.42 8.01
6.74 8.76 5.96 5.45 5.64 8.92
8.11 5.10 4.28 7.96 4.26 8.24
4.41 7.12 4.54 5.83 4.03 6.09
5.64 6.28 6.68 11.51 4.66 6.34
7.43 8.42 6.73 5.07 4.80 5.47
3.12 7.81 7.45 5.56 5.42 6.99
4.42 6.93 5.13 5.96 7.26 8.24
6.50 6.89 7.13 7.16 4.84 5.97
5.64 7.16 5.63 3.07 4.65 7.32
4.66 6.27 10.24 8.91 8.32 5.41
7.32 7.32 6.87 6.94 4.70 6.79
6.28 5.93 7.96 4.99 7.74 6.82
7.45 7.02 2.61 6.54 8.57 5.60
Practical Exercise 3
30 pairs of values of two variables X and Y are given below.

X 14 20 33 25 41 18 24 29 38 45
Y 148 242 296 312 518 196 214 340 492 568
X 23 32 37 19 28 34 38 29 44 40
Y 282 400 288 292 431 440 500 512 415 514
X 22 39 43 44 12 27 39 38 17 17
Y 282 481 516 598 122 200 451 387 245 245

Construct the bi-variate distribution table for the above data and assume class
interval of X as 10-20, 20-30 etc and for Y take class interval as 100-200, 200-300
TABULATION

DEFINITION
According to Tuttle, “A statistical table is the logical listing
of related quantitative data in vertical columns and
horizontal rows of numbers, with sufficient explanatory
and qualifying words, phrases and statements in the form
of titles, heading and footnotes to make clear the full
meaning of the data and their origin”
OBJECTIVES OF TABULATION
1. To simplify the complex data

2. To economize space

3. To facilitate comparison

4. To facilitate statistical analysis

5. To save time

6. To depict trend

7. To help reference
Components Of Table
1. Table number

2. Title of the table

3. Caption / Box head

4. Stub

5. Body / Field

6. Head note

7. Foot note

8. Source data
Stub Caption Total
headings (rows)
Subhead Subhead

Column- Column Column- Column


head head head head

Stub
Entries

Total
(columns)
Foot note :
Source note:
REQUIREMENTS OF GOOD
STATISTICAL TABLES
1. Suit the purpose
2. Scientifically prepared
3. Clarity
4. Manageable size
5. Columns and rows should be numbered
6. Suitably approximated
7. Attractive get-up
8. Units
9. Average and totals
10.Logical arrangement of items
11.Proper lettering
Types of tables
1. Simple and Complex tables.

2. General purpose and special purpose tables.

3. Original and derived table.


Advantages of classification and
tabulation
1. Clarifies the object

2. Simplifies the complex data

3. Economic space

4. Facilitates the comparison

5. It helps in references

6. Depict the trend


Disadvantages of classification and
tabulation
1. Complicated process

2. Every data can not be put into tables

3. Lack of flexibility
Practical Exercise 4
In certain data, the following four main characteristics
with their sub-characteristics are present
Main Characteristics Sub Characteristics
1. Locality Urban / Rural
2. Religion Hindus / Muslims/Christians / Sikhs
3. Sex Males / Females
4. Age 0-30 / 30-60 / above 60

You might also like