You are on page 1of 35

Presentation of data

Classification of data
Classification is the grouping of related facts into
different classes. Facts in one class differ from
those of another class with respect to some
characteristics called basis of classification.
Types of classification:
I. Geographical
II. Chronological
III. Qualitative
IV. Quantitative
1. Geographical Data:
Geographical data are classified on the basis
of geographical or locational differences
between the various items. For example:
when we present the production of crop for
various states this would called geographical
data.
2. Chronological Data:
When data are observed over a period of time,
the type of classification is known as
chronological observation. For Example: the
sales figures of a company are given below:
Year Sales Year Sales
(Rs. Lalhs) (Rs. Lakhs)
1996 18810 2001 46725
1997 23601 2002 45724
1998 23816 2003 50117
1999 32435 2004 53900
2000 39343 2005 61795
3. Qualitative Data
In qualitative classification, data are classified on
the basis of some attribute or quality such as
sex, colour of hair, literacy, religion, etc. In this
type of classification attribute under study
cannot be measured one can only find out
whether it is present or absent in the units of
the population under study. Thus when only one
attribute is studied, two classes are formed; one
possessing the attribute and the other not
possessing it.
4. Quantitative Data
Quantitative classification refers to the classification of data
according to some characteristics that can be measured, such as
height, weight, income, sales, etc. For example, the workers of
a factory may be classified according to wages as follows
Monthly No. of Monthly No. of
wages workers wages workers
4000-4500 50 5500-6000 360
4500-5000 200 6000-6500 90
5000-5500 260 6500-7000 40

In this type of classification there are two elements, namely (i)


the variable, i.e., the monthly wages in the above example, and
(ii) frequency, i.e., the number of workers in each class.
Types of variables
a) Discrete Variable: A discrete variable can
assume only some specific values in a given
interval. For example, the number of children
in a family, the number of rooms on each
floor of a multi-storied building, etc.
b) Continuous Variable: A continuous variable
can assume any value in a given interval. For
example, monthly income of a worker can
take any value, say, between Rs 1000 to
2500.
Construction of a frequency distribution
Construction of a Discrete frequency distribution
A discrete frequency distribution may be ungrouped or
grouped. In an ungrouped frequency distribution, various
values of the variable are shown along with their
corresponding frequency.
a) Ungrouped Frequency Distribution of a Discrete variable:
Ex suppose that a survey of 15 house was conducted and
number of rooms in each house was recorded as shown
below:
5,4,4,6,3,2,3,6,5,1,4,5,1,3,4
The method of tally marks is used to count the number of
observation of the frequency.
b. Grouped Frequency Distribution of a Discrete
variable:
Consider the data on marks obtained by 50 students in
statistics. the variable X denoting marks obtained is a
discrete variable, let the grouped frequency distribution
of this data be as given in the following table.
Marks between and including Frequency
30-39 4
40-49 6
50-59 8
60-69 12
70-79 9
80-89 7
90-99 4
Construction of a Continuous frequency distribution

A continuous variable can take any value in an interval.


Measurements like height, age, income, time, etc. are
some example of a continuous variable. The frequency
distribution of continuous variable is always grouped.
The following decisions are required to be taken in the
construction of any frequency distribution of a
continuous variable.
1. Class limits
2. Class Width
3. Class Frequency
4. Class mid-point
• Class limits: the class limits are the lowest
and the highest values that can be included in
the class. For example, take the class 20-40.
the lowest value of this class is 20 and the
highest value is 40.
• Class Width: The Span of class, that is the
difference between the upper limit and the
lower limit is known as class interval. Ex: in
the class 20-40 the width of the class interval
is 20.
• Class
  Frequency: The number of observations
corresponding to the particular class is known as
the frequency of that class or the class frequency.
if we add together the frequencies of all
individual classes we obtain the total frequency.
• Class mid-point: It is the value lying half-way
between the lower and upper class limits of a
class-interval. Mid- point of a class is ascertained
as follows

Ex: for the class interval 20-40 mid point is 30


Types of class intervals
There are two methods of classifying the data according to class-
intervals, namely (a) exclusive method, and (b) inclusive method.
a) Exclusive method: when the class-intervals are so fixed that
the upper limit of one class is the lower limit of the next
class, it is known as the exclusive method of classification.
Income(Rs) No. of Income(Rs) No of
employees employees
5000-6000 50 8000-9000 150
6000-7000 100 9000-10000 40
7000-8000 200 10000 and 10
above

In the above example, there are 50 employees whose income is


in between Rs 5000 and Rs 5999.99.in this case upper limit is not
included in the class.
b) Inclusive method: under the inclusive
method of classification the upper limit of
one class is included in that class itself.
Income(Rs) No. of Income(rs) No of
employees employees
5000-5999 50 8000-8999 150
6000-6999 100 9000-9999 40
7000-7999 200 10000-10999 10

In the class 5000-5999, we include employees


whose income is between Rs. 5000 and Rs.
5999.
Number of Class intervals
Number
•   of classes should preferably be between 5 and 15.
However, there is no rigidity about it. the classes can be more
than15 depending upon the total number of observations in the
data and the details required, but they should not be less than
five because in that case the classification may not reveal the
essential characteristics.
Formula for approximate number of classes:

k= the approximate number of classes.


N=Total number of observations.
However, the precise number of classes to be used for a given
variable depends upon personal judgment and other
considerations.
Q. The profit (in lakhs of rupees) of 30 companies for
the year 1999-2000 are given below:
20, 22, 35, 42, 37, 42, 48, 53, 49, 65, 39, 48, 67, 18, 16,
23, 37, 35, 49, 63, 65, 55, 45, 58, 57, 69, 25, 29, 58, 65.
•  
Solution:
Approximate number of classes(k)=1+3.322 log N
k=1+3.322 log 30=1+3.3228x1.4771=5.91
Range=69-16=53

Since value like 3,7,9 should be avoided and therefore we will take
10 as the class interval and the first class as 15-25.
Profit(Lakh) No of Companies
15-25 5
25-35 2
35-45 7
45-55 6
55-65 5
65-75 5
total 30
Q. Twenty students of a class appeared in an
examination. Their marks out of 50 are as under
5, 6, 17, 17, 20, 21, 22, 22, 22, 25, 25, 26, 26, 30,
31, 31, 34, 35, 42, 48.
Prepare a classified table, according to exclusive
and inclusive methods.
Relative and percentage frequency distribution
• instead
If   of frequencies of various classes their
relative or percentage frequencies are written, we
get a relative or percentage frequency distribution.
Class intervals Frequency Relative frequency Percentage
frequency
1.00-1.10 4 0.044 4.4
1.10-1.20 7 0.079 7.9
1.20-1.30 10 0.111 11.1
1.30-1.40 14 0.156 15.6
1.40-1.50 20 0.222 22.2
1.50-1.60 13 0.144 14.4
1.60-1.70 9 0.100 10.0
1.70-1.80 6 0.067 6.7
1.80-1.90 4 0.044 4.4
1.90-2.00 3 0.033 3.3
Total 1.000 100
Complete the following table
Class Intervals Frequency Relative Percentage
Frequency Frequency
60-70 5
70-80 11
80-90 14
90-100 18
100-110 16
110-120 6
Total 70
Cumulative frequency distribution
A cumulative Frequency Distribution can be of
two types
• Less than type cumulative frequency
distribution
• More than type cumulative distribution
Class intervals Frequency Less than More than
(the upper limit) (the lower limit)
cumulative cumulative
frequency frequency
1.00-1.10 4 4 90
1.10-1.20 7 11 86
1.20-1.30 10 21 79
1.30-1.40 14 35 69
1.40-1.50 20 55 55
1.50-1.60 13 68 35
1.60-1.70 9 77 22
1.70-1.80 6 83 13
1.80-1.90 4 87 7
1.90-2.00 3 90 3
Total 90
Complete the following table
Class Intervals Frequency Less than More than
(the upper limit) (the lower limit)
cumulative cumulative
frequency frequency
60-70 5
70-80 11
80-90 14
90-100 18
100-110 16
110-120 6
Total 70
Frequency density
•Frequency
  density in a class is defined as the
number of observations per unit of its width.
Class intervals Frequency Frequency
Density
1.00-1.10 4 40
1.10-1.20 7 70
1.20-1.30 10 100
1.30-1.40 14 140
1.40-1.50 20 200
1.50-1.60 13 130
1.60-1.70 9 90
1.70-1.80 6 60
1.80-1.90 4 40
1.90-2.00 3 30
Total 90
Complete the following table
Class Intervals Frequency Frequency
Density
60-70 5
70-80 11
80-90 14
90-100 18
100-110 16
110-120 6
Total 70
Univariate frequency distributions
If the data are classified according to only one characteristic
these distribution is called univariate frequency distribution
.Example: Class Intervals Frequency
60-70 5
70-80 11
80-90 14
90-100 18
100-110 16
110-120 6
Total 70
Bivariate frequency distribution
A frequency distribution obtained by the
simultaneous classification of data according to
two characteristics is known as a bivariate
frequency distribution and tabular
representation of the bivariate frequency
distribution is known as a contingency table.
Example : 100 couples are classified according
to the two characteristics, Age of the husband
and age of wife.
Classification according to Age of Husband and
Age of wife
Age of Age of
husband wife 10-20 20-30 30-40 40-50 50-60 total

10-20 6 3 0 0 0 9

20-30 3 16 10 0 0 29

30-40 0 10 15 7 0 32

40-50 0 0 7 10 4 21

50-60 0 0 0 4 5 9

Total 9 29 32 21 9 100
Example: A set of 100 people who have pets are
polled to see if there was a relationship between
gender and whether they had a dog or a cat .

Dog cat total


Male 42 10 52
Female 9 39 48
Total 51 49 100
Q. Prepare a contingency table for the following
data set.

Age of
husband 24 26 27 25 28 24 27 28 25 26

Age of wife 17 18 19 17 17 18 18 19 18 19
Age of
husband 25 26 27 25 27 26 25 26 26 26

Age of wife 17 18 19 19 20 19 17 20 17 18
Solution:

Age of Age of
wife husband 24 25 26 27 28 total
17 1 3 1 0 1 6
18 1 1 3 1 0 6
19 0 1 2 2 1 6
20 0 0 1 1 0 2
Total 2 5 7 4 2 20
Tabulation and parts of table
Tabulation is systematics presentation of numerical
data in rows and column. tabulation of classified data
make it fit for statistical analysis.
Main parts of the table
1. Table number: this number is helpful in the
identification of a table. This is indicated at the
top of the table.
2. Title: Each table should have a title to indicate the
scope, nature of contents of the table in an
unambiguous and concise form.
3. Caption and Stubs: heading or subheading used
to designate columns are called captions while
those used to designate rows are called stubs.
4. Main body of the table: it contains the
numerical information.
5. Head notes: A head-note is often given below
the title of a table to indicate the units of
measurement of the data. This is often enclosed
in brackets.
6. Foot note: Abbreviations, if any used in the
table or some other explanatory notes are given
just below the last horizontal line in the form of
footnotes.
7. Source-notes: this note is often required when
secondary data are being tabulated. This note
indicates the source from where the
information has been obtained.

You might also like