You are on page 1of 10

(A) Personal Details

Role Name Affiliation

Principal Investigator Prof. Masood Ahsan Siddiqui JamiaMilliaIslamia New Delhi

Paper Coordinator Prof. Aslam Mahmood Jawaharlal Nehru University New


Delhi

Content writer/Author Dr.Sushil Dalal RohtakUniversity, Haryana


(CW)

Content Reviewer (CR) Prof. Aslam Mahmood Jawaharlal Nehru University New
Delhi

Language Editor (LE)

(B) Description of Module

Items Description of Module


Subject Name Geography

Paper Name Quantitative Techniques in Geography

Module Name Data Management: Tabulation and Frequency Curves

Module ID QT 3

Pre- Requisites A person should have basic knowledge of data collection .


Objectives To initiate students into methods of statistical analysis
Data Management: Tabulation and Frequency Curves
Sushil Dalal,
Associate Professor,
Department of Geography,
Pt. N. R. S. Government College, Rohtak
(MaharshiDayanand University, Rohtak)

(1) E-Contents

In any exercise on data collection and analysis after converting the socially meaningful concept into
numerical forms of “Data” and collecting information about them either through Census
enumeration or through sample surveys, the next step is to put the mass of the collected data into a
systematic and manageable form. We can’t refer the raw data in any text or report as it will be too
lengthyand found in a random order. Therefore, the data set is needed to be transformed into a
systematic and manageable form.

Tabulation of the raw data into a concise form is, therefore, the an important step in most of the
statistical analyses. Tabulation serves the dual purpose of putting the data in a systematic as well as
in a manageable form. It, not only puts the data in a concise form but, also arranges the data into a
systematic form.

Tabulation

After the data is arranged in a frequency table it will become much easier to handle it for further
statistical analysis and it ca also be easily referred to in anywhere in the text. The raw data for this
purpose can be transformed into grouped as well as ungrouped “Frequency Distribution Tables”.

Frequency Distribution Table


When we collect the data from the field it is found not in any order. We can’t refer it anywhere in
thetext as it does not carry any meaning in that form. Through tabulation we can make it appear
more meaningful and also handy to manage. The raw data is converted into small groups and
number of observations falling in each group are recorded. Observations falling in a group are
considered as similar.By classifying the data into groups in tabulation we remove the minor
differences in the data and retain the major differences.In a frequency distribution table we have
two columns. First column gives the range of the group known as class and the second gives the
frequency of each class i.e. number of observations falling in each class.

There are two types of frequency distributions: (a) Ungrouped and (b) Grouped.

Ungrouped Frequency DistributionTable

In an ungrouped frequency distribution the classes consist of the fixed number and is used for the
data which is discontinuous by nature and can’t occur in fractions; like size of the family, number of
schools, number of floods in a year to a river etc. The range of the discontinuous data, generally, is
not very large. An ungrouped frequency distribution table may look like the one given below:
Size of family (X) Number of families(
f)
1 2
2 14
3 22
4 24
5 18
6 14
7 6
Total 100

Grouped Frequency Distribution Table

Most of the time we have to handle the data which is continuous by nature, like: rainfall, agricultural
production, income etc. Such data occurs in frictions also. The range of the continuous data is also
large. In such cases, instead of the fixed number of the variable the classes are formed into some
ranges, known as classes and the number of observations, known as frequency, falling in each class
is tabulated. A hypothetical frequency distribution table of the grouped data of the daily rainfall of
90 days of a season of an area may look like the one given below:

Distribution of Daily Rainfall of 90 days of an area( In mm)

Daily rainfall (in mm) Number of days (f)


20 -30 5
30 - 40 6
40 -50 11
50 -60 18
60 -70 19
70 - 80 15
80 - 90 13
90 -100 2
100 -110 1
Total 90

In the above frequency table, the values of the variable are tabulated for smaller group of the values
of the variables which are known as classes. Every class has two values known as class limits: Lower
class limit as well as Upper class limit. The difference between the upper limit and the lower limit of
any class is known as class interval. In the present case the first classhas the lower limit as 20.0mm
and the upper limit as 30.0 mm and the class interval of 10.0 mm.. In the second class the lower limit
is 30.0 mm and the upper limit is 40.0 mm, and so on. All the class intervals of the above frequency
distribution table are equal. We notice that upper limit of every class become the lower limit of the
next class. So, it should not be counted at two places. The convention is that any value less than the
upper limit should be included in the class itself. However, the values equal to the upper limit a class
should go to the next class where it is the lower limit. So in every class the lower limit is included in
the class but not the upper limit.

In a grouped frequency distribution table number of classes and the class intervals are very
important and are related to each other. If our class intervals are large, the number of classes will be
less. On the contrary if the class intervals are small, number of classes will increase.
A good frequency distribution table maintains balance between the two. Very large number of
classes will lose the advantage of summarising the data. A very small number of classes like; 2 , 3 or
4 will result in significant loss of information.

There are suggestions regarding the number of classes, one such suggestion traditionally referred in
the books is that the number of classes of a frequency distribution table, k , should be determined by
the formula:

K = 1+ 1.33 Log N which is hardly in practice.

Even when it is found to have class interval not in rounded form, the class intervals of multiple of
five or ten are preferred due to practical reasons.

Unequal Class interval

The difference between upper limit and the lower limit of a class is known as the class interval which
may be equal or may not be equal for all the classes. Class intervals are commonly of equalsize. In
some cases, however, the equal class intervals are not required also. For example, the tabulation of
urban settlements whose size in India varies from below 5000 population to 12442000 (highest
population of Mumbai 2011) population, uses unequal class intervals due to the range of variations
in data. For a range of 12437000, if we use equal class intervals of 5000 each we require
12437000/5000= 2488 (after rounding) classes. This as cumbersome as the data itself, no
simplification in handling and interpretation.On the other hand if we take 10 classes of class interval
of 10,00,000.0 population, we heavily loose the details as the very first class from below 5000 to
10,00,000.0 (below million cities) will have 7882 towns out of total 7935 towns in India in 2011 (99.3
%). This is as bad as having no information.

In such cases where the range of data is too large, for example population of towns, income of
individuals in a society, land holdings among farmers etc. we are forced to go for unequal class
intervals in such a manner that class intervals are smaller to begin with the smaller values and
become larger and larger as we procced to the higher values. Indicating that smaller differences
can’t be ignored at lower end but same differences are not equally important as we move to higher
values where only higher differences matter. Thus Census of India classifies the towns in the form of
unequal class intervals as given below:

Size class distribution of towns India 2011


Size class of Population Class interval Number of towns
towns (2011)
Class VI town Below 5000 499
Class V 5000 - 10,000 2188
towns
Class IV 10,000 - 20,000 2238
towns
Class III 20,000 – 50,000 1912
towns
Class II 50,000 – 100,000 600
towns
Class I 100,000 and above 496
towns
Total 7933
Either for equal or for unequal class interval, the choice of the class intervals is crucial. For equal
class intervals one has to decide about the number of class intervals only. Range of data divided by
number of classes will determine the class intervals. Often, the researchers marginally alter it also to
suit their convenience. For example if the class interval as per calculations are found to be 19.73 one
can change it to 20.0 for the ease of computations and interpretations. There are no hard and fast
rules regarding number of classes. The guiding principle is that they should not be too many or too
less. Commonly their number lies between 9,10 to 12, 15.

For unequal class intervals, number of classes are generally less as each class represents a category
of the data and there should not be larger number of categories to avoid confusion. For example, in
the case of census classification of towns of India, class intervals correspond to well recognizedsix
classes of towns. What is more important in such cases is the understanding of the researcher
toconvert the data into meaningful categories.

Example
Following example shows the process of the conversion of a small set of raw data into a
frequency distribution table and its conversion into a “Histogram”. It shows a hypothetical
set of data of the production of Wheat in 100 plot of equal size of one hectare each in an
area which is given in the table below.
Production of wheat in quintals (00 Kg) per plot of one hectare

20.3 20.2 19.8 20.1 21.0 20.9 20.2 19.9 19.6 19.2
20.3 21.1 19.7 19.1 18.3 18.1 17.9 20.7 20.0 19.4
18.3 18.0 17.0 17.2 22.3 20.7 21.3 18.9 19.7 21.0
21.1 19.8 18.5 18.2 22.1 21.1 18.1 19.3 19.9 19.7
18.8 18.9 16.9 20.1 20.3 18.1 17.6 19.4 20.3 21.1
20.2 22.1 18.7 19.5 20.1 23.0 22.9 22.8 22.8 22.5
20.9 20.4 20.1 20.6 20.9 18.0 20.3 18.1 19.7 18.2
18.3 17.1 20.2 23.0 20.1 18.9 18.3 21.2 17.3 17.6
19.3 19.0 21.3 22.1 19.9 18.8 21.1 23.1 23.6 23.1
20.1 19.8 19.7 18.3 17.1 18.3 19.0 20.1 20.1 18.9

As the range of data is quite low, the maximum value is 23 and the minimum is 16.9. The
range is 23.0 – 16.9 = 6.1. If we choose 10 classes every class would have an interval of 0.61
(00) kg. per hectare. ) 0.61 does not seem a conveniently understood figure compare to 1
hectare which is also close to it. Secondly, 10 classes appear to be quite large as the number
of plots are only 100.Thus a class interval of 1 (00) kg is considered to be quite easily under
stood and will give eight classes, which may be alright for the purpose of making a
histogram.
Starting with the lower class limit of 16.0 in which the minimum value of 16.9 will lie we
form the classes as given following frequency table given below:
Frequency Table
Production of Wheat in (00)Kg
in 100 plots of Size one Hectare

Production of wheat Number of plots


(00)Kg

16 -17 1

17 – 18 8

18 -19 15

19 -20 23

20 -21 25

21 -22 15

22 -23 8

23 -24 5

Total 100

Histogram
Distribution of Equal Class Intervals
A frequency distribution table arranges the data into some ordered form which helps us in
understanding the distributional properties of the data in a much better way than the raw
data. For example , after transferring the data into a frequency distribution form, we can
easily see as to how many observations are found in the middle of the values and how many
on the either side of it. We can also see the inequalities in the distribution and other
important socially important characteristics of the data. These characteristics become more
visible if we plot the distribution of the data on a “Histogram”.
A histogram is a collection of a set of rectangles with bases equal to the class interval of
each classof the corresponding frequency distribution and the height of the rectangle will be
equal to the corresponding frequencies of each class.
Taking the wheat production data of 100 plots of size one hectare each as given in above
table we prepared the ‘Histogram’ as shown in the figure given below. The first rectangle
has a base equal to 16.0 -17.0 , second rectangle has the base equal to 17.0 -18.0 and so on
until the last rectangle whose base is equal to the class interval of the last classof 23.0 –
24.0. The height of the first rectangle is equal to the frequency of the first class i.e. 1, the
height of the second rectangle is equal to the frequency of the second class which is 8 and
so on until the last class with height equal to 5.
A histogram can also be converted into a “Frequency Polygon” by joining the middle points
of the upper sides of each bar. To show the pattern of change as a gradual process the
polygon is converted into a smooth curve also, which is known as“Frequency Distribution
Curve” or only frequency curve. Such a frequency curve for the data on production ofwheat
is also shown below along-with the histogram.

Distribution of Un-equal Class Intervals


In the above histogram the height of the rectangles of a histogram are in proportion to the
frequency of each class as the class intervals of each class is equal. The Thus the area of
each rectangle will also be in proportion to the number of observations (frequencies) under
each rectangle. Frequency density of each class in equal class interval distribution need not
to be divided by the class intervals since all of them are equal. However, if the class intervals
are not equal, we have to take the height of each rectangle equal to the frequency density
of each class. Frequency density of a class is obtained by dividing the frequency by the class
interval.
Example
Consider the income distribution of 400 persons of a locality. Since there are large variations
in their income , the distribution is given in an unequal intervals.
Income (Rs.) Number of persons
0-500 200
500-1000 50
1000-2000 40
2000-5000 60
5000-10000 50
Total 400

A histogram without frequency density will give a distorted image. Thus, before making a
histogram we have to find out the frequency density for each class as shown below.

Income Number of Class interval Unit of Class Frequency


(Rs.) persons( interval Density
Frequency)
(1) (2) (3) (4) (5)
0-500 200 500 1 200
500-1000 50 500 1 50
1000-2000 40 1000 2 20
2000-5000 60 3000 6 10
5000- 50 5000 10 5
10000
Total 400

Now we can prepare a histogram considering the first class as 0-500 with a frequency = 200. As it is
the lowest class of interval 500, its frequencies are not divided. Class interval of Rs. 500 is taken as
standard unit. All other classes are converted into the units of the standard uit. The second class also
has an interval of Rs. 500, so its equivalence is one only. Third class is interval is Rs. 1000, which is
twice as large as the standard class. Fourth class interval is Rs. 3000 , which is six time as high as the
standard class and the last class has a class interval of 5000 which is 10 times as high as the standard
class. Column no. 4 of the above table gives the class interval of each class in the units of the first
class interval. Column no. 5 of the table gives the frequency density of each class per class intervals
of the standard class interval of Rs. 500.

Now the histogram will correspond to the fistr class of 0-500 with 200 frequencies. The second class
will correspond to 500-1000. Third class wil correspond to two classes 1000-1500 and 1500-2000
with each having the frequency of 20. Fourth class will correspond to six classes of 2000-2500, 2500-
3000, 3000-3500,3500-4000,4000-4500,and 4500-5000 each with a frequency of 10. Lastly the last
clas 5000-10000 will correspond to 10 classes of interval 500 starting from 5000-5500 and ending
with 9500-10000. Each of these 10 classes are with frequency 5.

A histogram of the above distribution of unequal class intervals will be as given below.
Income Distribution
250

200
Persons

150

100

50

Income (Rs.)

• Frequency curves play important role in statistical analysis. It helps us in understanding the
process through which it is generated.

• A usual process in which neither very high nor very low values are preferred will generate a
symmetrical curve. Like average annual rainfall of an area over a period of time, height of
children in given age group and agricultural productivity of plots in any adjoining area etc. A
symmetrical curve is such that if it is folded from the middle, one half of it will overlap the
other half.

• On the contrary due to certain natural or social factors the values in some distribution are
not found symmetrical and we will get a curve which is “Asymmetric” or “Skewed”.The
values show inequalities in its distribution either on the higher side or on the lower side.
Distribution of agricultural land holdings, income distribution and district wise proportion of
urban population etc. will show the curves elongated to the right hand side and are known
as “positively skewed”. Proportion of rural population to total population in different
districts will give a curve elongated to the left hand side and are known as “negatively
skewed”.

• Death rates by age in a population will give a “U- shaped curve”, as mortality will be higher
in the beginning and at the end and will be lowest in the middle ages. Shapes of Symmetric
and skewed curves are also given below:
Comparison of Frequency Distributions

• Any research enquiry begins with observations of real world situation around us and
comparing it under different geographical situations. After we collect the data about the real
world and summarise it with the help of frequency tabulation, different types of graphs
provide us only a preliminary understanding about its comparative position under different
geographical conditions as they are not very accurate. For an accurate and meaningful
comparison we need some numerical measures of the distribution. There are several such
meaningful measures of any distribution known as ‘Descriptive Statistics”. Some of the
commonly used such measures are as given below:

• Measures of Central Tendency

• Measures of Dispersion and

• Measures of Skewness.

• First two measures i.e. measures of central tendency and measures of dispersions are very
important parameters of any distribution as they are used extensively in the theory of
sampling, inferences and in many other places also. Measures of skewness, however, are
relatively less frequently used.

You might also like