Professional Documents
Culture Documents
and Presentation
Classification of Data
• The process of arranging the data in
groups/classes on the basis of certain
properties
Purposes of Classification
• It condenses the raw data into a form
suitable for statistical analysis
• It highlights the features of the data
• Facilitates comparisons and drawing
inferences from data
– Comparison of performances
• Provides information about mutual
relationships
– Effect of literacy on criminal tendency
• Patterns, Clusters
Characteristics of Ideal Classification
• Should be unambiguous
– Define and avoid confusion
– Divide whole population: literates, illiterates
• Classes should be exhaustive and
mutually exclusive
• It should be stable
– Classes must remain unchanged every time
investigation is carried
– Census suffers based on occupation
– Comparison becomes difficult
Characteristics of Ideal Classification
• It should be flexible
– To accommodate suitable changes in new
situations
– Sub-classes be made to accommodate
changes
Basis of Classification
• Geographical classification
• Chronological classification
• Qualitative classification
• Quantitative classification
Geographical classification
• Based on geographical or location diff.
• Spatial classification
• Alphabetical order: Reference tables
Population
(crore) : 31.9 36.9 43.9 54.7 75.6 85.9 98.6
Qualitative Classification
• Basis of descriptive characteristics or
attributes
• Gender, region, caste
• Simple classification: class is divided into
two sub-classes and one attribute is
studied, male and female
• Manifold classification: A class is divided
into sub-classes which may be sub divided
further
– Population: Male Female - Literate Illiterate
Quantitative Classification
• Based on some characteristics which can
be measured
• Income, expenditure etc
• Continuous variable: can take any value
within the ranges
• Discrete variable: values change in steps
and cannot assume a fractional value
– No of children, no of students
Continuous and Discrete Variables in Data Set
Discrete Series Continuous Series
No. of No. of Weight No. of
Children Families (kg) Persons
0 10 100 to 110 10
1 30 110 to 120 20
2 60 120 to 130 25
3 90 130 to 140 35
4 110 140 to 150 50
5 20
320 140
Organising Data Using Data Array
• Number of overtime hours worked for 30
consecutive weeks by a machinist in a
machine shop
• Data is in raw form
94 89 88 89 90 94 92 88 87 85
88 93 94 93 94 93 92 88 94 90
93 84 93 84 91 93 85 91 89 95
Data Array
• Raw data does not reveal any
characteristic
– Highest, lowest, average, overtime hours
cluster
Data Array
84 88 89 91 93 94
84 88 89 92 93 94
85 88 90 92 93 94
85 88 90 93 93 94
87 89 91 93 94 95
Advantages
• Highest and lowest observation
• Helps in dividing the values into various
parts
• Concentration around particular value
•
Frequency Distribution
• Frequency distribution divides
observations in the data set into
conveniently established ordered groups
• The number of observations in each class
is referred to as frequency and is denoted
by f
• Marketing manager wants to know how
many units of each product sells in a
particular region during a given period
Frequency Distribution
• Advantages
Data are expressed in a compact form
Pattern of distribution
• Disadvantages
Individual observations loose their
identity
Frequency Distribution
Constructing a Frequency Distribution
Overtime Hours
Frequency
(Class intervals)
82 but less than 85 2
85 but less than 88 3
88 but less than 91 9
91 but less than 94 10
94 but less than 97 6
Mid-point of Class Intervals
• Decision maker cannot know how the
values are distributed within the class
• The class mid-point is the point half-way
between the boundaries of each class
Methods of Data Classification
• Based on the class intervals, the methods
are:
– Exclusive method
– Inclusive method
Exclusive Method
0 - 10 5
10 - 20 7
20 - 30 15
30 - 40 10
Inclusive Method
10 17 15 22 11 16 19 24 29 18
25 26 32 14 17 20 23 27 30 12
15 18 24 36 18 15 21 28 33 38
34 13 10 16 20 22 29 29 23 31
Solution
• k=6
• Largest value = 38
• Smallest value = 10
• h = (38 -10) / 6 = 5
Frequency Distribution
Class Interval Frequency
(Mutual Fund Tally (Number of
Prices, Rs) Mutual Funds)
10 - 15 6
15 - 20 11
20 - 25 9
25 - 30 7
30 - 35 5
35 - 40 2
40
Exercise - 2
• The take home salary (in Rs.) of 40
unskilled workers from a company for a
particular month was
• Construct a frequency distribution having a
suitable number of classes
Take Home Salary of 40 Workers
x y x y x y x y x y
550 12 225 25 680 13 202 29 689 11
13 19 22 14 13 16 19 21
23 11 27 25 17 17 13 20
23 17 26 20 24 15 20 21
23 17 29 17 19 14 20 20
10 22 18 25 16 23 19 20
21 17 18 24 21 20 19 26
Exercise - 1
• Group these figures into a table having the
classes 10 – 12, 13 – 15, 16 – 18,…..and
28 – 30.
• Convert the distribution into a
corresponding percentage frequency
distribution and also a percentage
cumulative frequency distribution
Frequency Distribution
10 -12 2
13 - 15 6
16 - 18 10
19 - 21 16
22 - 24 8
25 - 27 5
28 - 30 1
Percentage and More Than
Cumulative Percentage Distribution
Exercise
• High performance Bicycle Products
Company in Chapel Hill, North Carolina,
sampled its shipping records for a certain
day with these results