You are on page 1of 12

3.

FREQUENCY DISTRIBUTION

3.1 Attribute and Variable


situations, data may not be quantitativee .at
Statistical data are numerical, although in some

data are appropriately classified under differen


firststage. At the outset these non-numerical
number of items included in each category is
mutually disjoint categories and, subsequently,
obtained.
be numerically expressed. Individuale
An attribute is aqualitative character that cannot
several disjoint classes. Mother tongue
possessing an attribute can, however, be grouped into
of a group of people, grade obtained by students in a test or colour of flowers represents an

attribute.
Data on attributes may be of two types They are ordinal when there
ordinal and nominal.
is a clear ordering of the forms or categories of the attribute. For example, when the character
education' is measured with categories primary school, high school, college and post-graduate,
or the character 'economic status' is measured with categories poor, middle class and rich,
there is a clear ordering of the categories though the absolute distances between them are
unknown. Such data are common in the social sciences, in particular for measuring attitudes
and opinions on various issues and status of various types. When the various forms of an
attribute differ in nature, not in quantity, the data are nominal. For example, when we record
the religions of persons as Hindu, Muslim, Christian, etc. or when the mother tongues of
students are noted as Bengali, Hindi, Marathi, etc. the data are nominal. It is obviousthatthe
order of listing of the different forms of an attribute is unimportant in this case.
The term variable (or variate) means a character of an item or an individual that can be
expressed in numerical terms. It is also called a quantitative character and such characters
can be measured or counted. Weight of students in a school, ages of boys, family size,etc.
are characters of this type.

3.2 Discrete variable and continuous variable

Variables can be classified in two main types, namely (i) Discrete


(ii) Continuous.
or discontinuous, an
G) Discrete variable : A quantitative character that can take
certain isolated values only
in its range of variation is called a discrete variable. The number of students in
size of families differc
colleges, the of some locality,
proportion males in a group of perso
of 10
ons,

etc. are examples of this kind of variable.

(ii) Continuous variable


variable that can assume
: A niation

any value within its range of va


is termed as a continuou variable. The weight of
examination, income of individuals,
different persons, etc.
candida
marks obtained by canai
in an
belong to this category
ry of variables.
of varia
20
FREQUENCY DISTRIBUTION 21
It should be noted that the recorded
measurements may show some discreteness in
case, but it is merely artificial. In fact, such tnis
apparent discreteness arises due to the limitations
of the measuring instrument.

3.3 Frequency distribution of an attribute


We consider the data collected by a particular medical research organisation about the sex
of newly born babies during a month in
female births and 22 male births.
a city
hospital. According to the data there were
The notion of the term frequency pertaining to an attribute
may be introduced in the light
of these data about the attribute 'sex of baby'. Here the
figure 18 represents how many or the
births are female. In other words, the number 18 shows the frequency of the form 'female" of
the attribut concerned. In the similar manner, the number 22 indicates the frequency of the
form of the attribute. Of course, the sum of these two
'male figures gives the total frequency.
TABLE 3.1
SEX OF INFANTS BORN IN A CITY HOSPITAL
Sex Number of births
Female 18
Male 22
Total 40

The above table reveals how this total frequency 40, is distributed over the two categories
of the attribute.
The table (TABLE 3.1) shows the frequency distribution of the attribute under study.
Sometimes, the proportions or the relative frequencies may be used as an alternative to
frequencies and a table similar to TABLE 3.1 may be prepared to present them against the
corresponding forms of the attribute. In this case, relative frequency is 18/40 (or 0.45) for the
form female and 22/40 (or 0.55) for the form male.
It should be mentioned that we may, similarly, have frequency distributions of attributes
with more than two forms.

3.4 Frequency distribution of a variable


The idea of frequency distribution of a variable may be discussed under two sections, one
for discrete variables and the other for continuous variables.

3.4 (a) Case of a discrete variable


Let us consider the discrete variable family size. A survey was performed in a locality of
Calcutta and the following data relating to number of members in
sizes) were recorded.
different families (i.e. family
22 STATISTICAL TOOLS AND
TECHNIQUES

TABLE 3.2
FAMILIES
NUMBER OF MEMBERS OF DIFFERENT
3 2 2
3 5
5 5 3 4
6 4
2 3 3
6 6 B 5
4 4 5 7
6 4 6 4 3
5 3 6 2 4
5 4 5 4
4 3 4
ods
4 4 4 3 4 4
t h e data are arranged in a systematic and compact form, then one can easily understand
the significance of them. To meet the purpose, the frequency distribution of the variable family
S1Ze i s constructed. On going through the data, we find that the range of the values is 2 to 7.
The
values 2, 3, , 7 are taken successively to form six classes and the given figures
considered one by one and recorded in the respective classes with the help of tally
are
marks
. In order to facilitate counting,
tally marks in kept of five. After
are
marks, the fifth one is drawn across the preceding four.
groups every four
TABLE 3.3
TALLY MARKS FOR THE GIVEN VALUES
Family size Tally Marks
2

The table of tally marks enables us to obtain the


concerned.
frequency distribution of the variable
TABLE 3.4 wizus
FREQUENCY DISTRIBUTION OF FAMILY SIZE
Family size
Frequency
9
3
20
30
17
10

Total
.90
FREQUENCY DISTRIBUTION 23

The same frequency distribution may be presented with relative frequencies in place of
frequencies. The relative frequency of the value 2 will give the proportion of families having
size2, and similarly for other values. Thus for instance, the relative frequency of the value 3
is 20/90 (or 0.222). Again, to respond to such queries as 'How many families are there with
4 members or less?' or 'How many families are there with 5 members or more?", we are to
find cumulative frequencies of the less-than type and 'greater-than type. For the former we
are to give the totals of the frequencies proceeding from the lowest class upwards, and for
the latter proceeding from the highest class downwards.
One can also show cumulative relative frequencies, by adding successively the relative
frequencies, starting from the top of the table and then from the bottom of the table for the
less-than type and greater-than type respectively.
It may be noted that we take one class for each different value of a discrete variable when
the range of variation is small. But when the range is large we are forced to take a group ot

values for each class.


TABLE 3.5
RELATIVE FREQUENCIES AND CUMULATIVE FREQUENCIES FOR
THE FREQUENCY DISTRIBUTION OF FAMILY SIZE

Family size Relative Cumulative frequency


frequency less-than type greater-than type

0.100 90
0.222o 29 81
3
0.333 59 b 61p
0.189 76 31
0.111 o 86 14
6
0.044 90 4
0.999 1
Total
continuous variable
3.4 (b) Case of a
take an infinite number of values within its range of variation
A continuous variable may
individual class cannot be considered for each distinct value of
and, as such, it is natural that
the variable. To explain this fact,
let variable, namely
us consider a continuous height
of
persons, and record the data (in cm) as 165.5, 166.4, 165.2 etc. Here each figure is correct to
one decimal place and in the real
sense the reading 165.5, for example, means any value
between 165.45 and 165.55.
In fact, some suitable technique of classification would be necessary for presenting this
kind of data in some classes, their number being not very large. However, during the
construction of frequency distribution ot such a character, we come across the following useful

terms.
24 STATISTICAL TOOLS AND
TECHNIQUES

of variable values is classitied in some groups in the


lass-interval: The whole range
form of intervals. Each interval is called a class-interval.
C l a s s trequency : The number of observations included in a class is termed as absolute

frequency (or, simply, frequency) of the class.


3. Class limits: These are the two end-points of a class interval used for tally marking
boundaries of the clasSs.
ne given values. However, these limits do not show the real
4. Class boundaries : In case of a continuous variable, its values are rounded off; for
In other words, the number 22
instance, any value from 21.5 to below 22.5 is taken as 22.
stands for any value from 21.5 to below 22.5. The two real end-points of a class interval are
called class boundaries. Clearly, the upper boundary of a class coincides with the lower

boundary of the next class. The class boundaries are used for forming the frequency distribution
of a continuous variable.
5. Clas-mark: The mid-value of a class interval that lies half-way between its two end
class boundaries) is termed as class-mark.
points (i.e. class limits or

6. Class width: The difference between the upper and lower boundaries of a class interval
is called the width or size of the class.
7. Frequency density: The frequency density of a class is the frequency per unit width
of the class,
class frequency
i.e., frequency density =

width of class-interval
Fequency densities are used for comparing the concentration of frequencies in different
classes, particularly when the classes are of unequal width.
Now, let us consider the problem of construction of frequency distribution of a continuous
variable and the relevant guidelines. SuPpose we are given n values ofa continuous variable.
To prepare a frequency distribution with the given values we proceed as follows.
We first pick up the smallest and the greatest of the given values. Their difference gives
the range of variation. The range is then divided into a suitable number of classes depending
on the total frequency.
In determining the classes we have to bear in mind the following points.

1. The classes should be exhaustive so that no value escapes classification.

2. The classes should be mutually exclusive (i.e. non-overlapping) so that no value come
under more than one class.

3. The number of classes should not be very


large.
If the number of classes be large,
primary objective
of classification, namely summarisation, is defeated. It hould also be should also be
m e n t i o n e d that by considering
1entioned alarge number of classes one may introduce an patterm

in the frequencies which may absent in the actual population.


be irregular pa
FREQUENCY DISTRIBUTION 25

4. The number of classes, again, should not be


very small. If there are too few clasSC
true nature of the distribution may be obscured. Moreover, in such cases different
statisuca
measures would involve large error due to grouping (arising out of the assumption u a
the frequencies in a class are concentrated at the mid-value of the class). a
There is no hard and fast rule regarding the number of classes. As a working rule, one can
take 15 to 20 classes when the tota! frequency is well above 1000, 10 to 15 classes for total
frequency around 10OC. With total frequency far smaller than 1000, one may take fewer classes,
however, 7 or 8 classes will be sufficient in case total frequency is near 200.
5. Equal width should preferably be maintained for the different classes. This will enable
one to compare the class frequencies. Moreover, the computation of different statistical
is not to
measures will be comparatively easy. However, this condition of equal class-width
classes
be rigidly followed. There are situations (as in the case of income distribution) where
of varying width are preferred in order to make the classification more Significant.
number of
Keeping the above to divide the range into a suitable
points in mind, we are
sometimes we are required to take
classes, defined in terms of class limits. For convenience,
one obiained from the data. Next,
the given values
a range which is slightly bigger than the

are taken one by one and recorded in their respective


classes with the help of tally marks.
considered. In order to facilitate counting
The procedure is continued until all the values are
marks are kept in groups of five, as in the case
of discrete variables. The table will look
tally
like
Class limits Tally marks

T
a (a +c d)
(a + c)- (a + 2c - d)

(a +k-1c)
-

(a + kc -d)
Here a S the smallest given value;
desired width of the
classes;
c is the
k is the number of classes;
a+ kc -
d2 the greatest given value
according as tne Vaues eiven integers, upto one place
ae in
d 1. 0.1 or 0.01 etc.
=

two places after decimal,


etc.
after decimal or upto
of tally marks we prepare the final frequency table
where claccoc
After completing the table
boundaries and the Irequencies of different classes are noted
acoin
are given in terms of class

them. For any class,


Lower boundary= lower limit d/2
-

Upper boundary upper limit + d/2


TECHNIQUES
26 STATISTICAL TOOLS AND

The final table will be like the following:


Frequency
Class boundaries
e
(a-dl2)) ( a + c d/2)
(a + c -d/2) - (a + 2c dl/2)

(a + k-1c -

d/2) -

(a + kc -

d/2)
n
Total

in terms of relative frequencies (1.e.


AIrequency distribution may also be represented
are found by successively
proportions) or cumulative frequencies. The cumulative frequencies
the class frequencies, where the addition begins
from top (i.e. the lowest class) or
adding or more-than type
cumulative
bottom (i.e. the highest class) depending on less-than type
cumulative frequency of a class indicates
frequencies are obtained. In fact, the former kind of
class. Again, some specific
the number of values less than the upper boundary of the concerned
values above or equal to the
more-than type cumulative frequency represents the number of
of less-than type
lower boundary of the corresponding class. Thus, cumulative frequencies
lower boundaries.
correspond to upper boundaries and those of more-than type to

Illustration3.1
marks in Mathematics of 50 students in a
Suppose the following data relating to a test on

college were noted.


42 37 46 4863 64 63 5357 55

72 55 54 33 48 56 34 77 65 58
47 59 44 35 75 40 45 56 55 65
48 56 52 53 34 42 58 65 43 54
46 57 62 58 53 43 47 54 60 48
Arrange the data in the form ofa frequency distribution table in 5 classes of equal length.
Prepare the table for cumulative frequencies and relative frequencies.
Here the smallest value = 33 and the greatest value = 77. So range = 77 - 33 = 44. We

onsider range as 50, slightly bigger than the range obtained from the data and hence
form
10.
five classes each of length
FREQUENCY DISTRIBUTIO)
27
TABLE 3.6
TALLY MARKS FOR
THE DATA ON
Clàss limits MARKS
31-40
Tally marks
THL
41-50
51-60
61-70
71-80

TABLE 3.7
FREQUENCY DISTRIBUTION OF MARKS OF 50 STUDENTS IN A
COLLEGE
Class boundaries
Frequency
30.5-40.5 6
40.5-50.5 14
50.5-60.5 20
60.5-70.5
70.5-80.5 3
Total 50

TABLE 3.8
RELATIVE FREQUENCY AND CUMULATIVE FREQUENCY TABLE OF MARKS

Relative Cumulative frequency


Class boundaries
frequency Less-than type More-than type
0.12 6 50
30.5-40.5
0.28 20 44
40.5-50.5
0.40 40 30
50.5-60.5
60.5-70.5
0.14 47 10

0.06 50 3
70.5-80.5
1.00
Total

of frequency distribution
D Diagrammatic representation
If a frequency distribution is exhibited in alagrams, then an overall idea regarding the
distribution may be readily developed. 1here are several modes of such graphical representation
but the choice of suitable figure depends on ne nature ot the character concerned.
28 STATISTICAL TOOLS AND TECHNIQUES

(a) Case of an attribute


in terms of frequencies,absolute
he frequency distribution of an attribute when expressedhorizontal bars. However, divided
aagrammatically represented by bar diagram with
y c the frequency distribution
in terms
or pie diagram is appropriate for exhibiting
d a m
of relative frequencies.
(b) Case of a discrete variable
ofdiscrete variable
Column diagram : In representing the frequency distribution a
axes of coordinates, the
at the outset, we may take two
graphically, mutually perpendicular
and the frequencies. Of
noizontal and vertical axes respectively showing the variate values
scale for each of the has to be appropriately chosen. Next, perpendiculars (or
axes
course, are drawn at the
columns) having heights equal to the frequencies of the variable values
The diagram so formed
coresponding points (indicating variable values) on the horizontal axis.
1s called a column diagram or frequency bar diagram. It should be noted that this diagram
can also be drawn using relative frequencies instead of absolute frequencies.

30

25 30
25
20
1 20
15
15

10 10

o
2
Family size Family size
Fig 3.1 Column diagram for the frequency Fig 3.2 Frequency polygon for the
distribution of family size (TABLE 3.4). frequency
distribution of family size (TABLE 3.4).

G) Frequencypolygon: It is a suitable diagrammatic mode of exhibiting the frequency


distribution of a discrete variable. Here, also, the variable values are located on the horizontal
ie and the frequencies on the vertical axis, in a rectangular system of
ilable data are plotted as points on the plane where the abscissa and ordinate coordinates. The
for a point
respectively indicate the variable value and the corresponding frequency. To get a closed figure,
variable value just preceding the least and that following the highest
snev) are included on the horizontal axis. Finally, the points are (each with zero
The polygon drawn in this way is termed a
successively joined by
line segments. frequency polygon.
FREQUENCY DISTRIBUTION
(iii) Cumulativeive 29
ecenting the
representing
frequency diagram or Step
representing the
frequency distribution in term
terms of diagram
frequency am:This
This diagram
diagr
care marked
values are marked on the horizontal axis cumulative meant foris
rical axis. The cumulative
rtical axis. and the
cumulative frequencies.
quencies. Here the variable
variable
the variable. Suppose the frequencies are
plotted frequencies are shown along tne
successive values of a
as
points against corresponding
the rresponding values
X with respective variable x in increasing order are X X
less-than type is 0 for any frequencies
value of the
f f f s. Here the ..,

variable less-than x, is f,cumulative frequency of tne


or equal to x but less than for any value
x2, 1s G +f) for
any value greater than
Xq, and so on.
Similarly, the cumulative greater than or equal to x but les
of the variable greater than x, is 0, is frequency of the greater-than type for any value
X, is f, for any value less-than or equal to x, but
,

G; +Jp-) 1or any value less-than


equal x, but greater than x andgreater
or to than
cumulative frequency diagrams of less-than and more-than
so on. Ihe

the former ascending from left to types will resemble two staircases,
right and the latter
ascending from right to left.
(c) Case ofa continuous variable
) Frequency polygon: A frequency
polygon may be drawn to exhibit the
frequency
distribution of a continuous variable, provided the classes are of
equal width. Here the
frequencies are plotted against the class-marks of the respective classes on the assumption
that the frequency of a class coresponds to its mid-value.
Finally, the plotted points are joined
successively by line segments. To get a closed polygon, we take two additional classes, one
at each end, which have zero
frequencies.
(ii) Histogram: It is an appropriate diagram for representing the frequency distribution
of a continuous variable in the sense that it considers the fact that the frequency of a class is

90r
80

70
60

50

40F
30
20
10

3
Family size

Fig 3.3 Step diagram (less-than type) for


the frequency distribution of family size (TABLE 35
30 STATISTICAL TOOLS AND TECHNIQUESS

class-boundaries are
over the interval. Here two coordinate
taken and the
axes are
spersed Next, a rectangle
1s drawn
over
on the horizontal axis for locating the class intervals.
SnOwn class frequency.
In other words,
clasS-interval so that its area indicates the corresponding
CaCn frequency density.
In this manner
the height rectangle becomes equal to the corresponding
of a
this entire group or
that the area covered by
erected so
4
Scles Or adjoining rectangles are
formed is called the histogram
or the
diagram
cctangles exhibits the total frequency. The that
so
which arc sameC as
the widths of rectangles,
irequency distribution. It should be noted
corresponding class widths, are not necessarily equal. distribution or
later, for finding mode of frequency
is used, as we shall see
a rough idea
A nistogram below specified variate value. It also gives
number of observations above or a

about the shape of the frequency curve

Less-than type
50
2.0
40
1.5
l o ad
30
1.0

20 More-than type
0.5
10
40.5 50.5 60.5 70.5 80.5
30.5
Marks
30.5 40.5 50.5 60.5 70.5 80.5
Class boundaries-

for the frequency Fig 3.5 Ogive for the frequency distribution of
Fig 3.4 Histogram
distribution of marks (TABLE 3.7). marks (TABLE 3.8).

ii) Ogive This diagram is used for exhibiting the frequency distribution of a continuous
:
either type). To draw an ogive, initially, two
variable in terms of cumulative frequencies (of
rectangular axes of coordinates are t a k e n - t h e horizontal one showing the variable values
representing the cumulative frequencies. In case of less-than type
and the vertical one

cumulative frequencies, they are plotted against the upper class-boundaries as different points
which are joined successively by line segments to get less-than type ogive. Again, for a more-
than tvpe ogive, cumulative frequencies of more-than type correspond to lower class-boundaries,
the mode of construction being similar te the previous one. Obviously, cumulative frequency
of less-than type is zero for the lower boundary ot the lowest class and it isincluded in drawing
the diagram. Similarly, cumulative frequency of greater-than type is zero for the upper-boundary
which has to be included.
of the highest class
An ogive is primarily used for finding quantiles of different orders(such as median, third
quartile). From it one can also find the number of observations above or below a certain variate

value.
FREQUENCY DISTRIBUTION 31
3.6 Frequency curve

One of the diagrammatic representation of the


variable is frequency distribution of a continuous
histogram. In this diagram we take the variable
a
values along the horizontal axis
and the frequency densities along the vertical axis, and the
area under the histogram
the total frequency. Now, suppose the total frequency is gradually increased and the becomes
width of
each class is gradually decreased (so that the total number of classes gradually increases).
Then the histogram will gradually approach a smooth curve. Frequency curve is the limiting
form of the histogram (with relative frequency densities, instead of frequency densities
vertical axis) when total frequency tends to infinity and at the same time width of the classes
along
tends to zero. The area under the curve will be unity.

Variable
Variable>
curve. Fig 3.7 U-shaped frequency curve.
Fig 3.6 Bell-shaped frequency
olid (

Variable Variable
curves,
Fig 3.8 J-shaped frequency

curves may be of
diferent shapesbell shaped (symmetrical or moderately
The frequency
asymmetrical), U-shaped, J-shaped, etc,

You might also like