You are on page 1of 12

Chapter 1 : Types of data

Data is, what we call in normal parlance, facts and figures that we come across everyday in
newspapers, books, TV channels and the net.
Broadly, data can be classified as qualitative data and quantitative data. In any data, we have
something, called as an attribute or property or characteristic.
In the case of quantitative data, we can measure the attribute and also the number of units
having the attribute. We can take the example of an attribute like marks or income or heights.
We can measure the marks and also the number of students having so much marks.
Marks Number of students
45 5
56 10
67 2
Here marks is the attribute and number of students is called as the frequency (f). The above
table is called as a frequency table.
In the case of qualitative data, we cannot measure the attribute but we can only measure the
number of units having the attribute. We can take the example of an attribute like specialisation
or department or gender or occupation. Here specialisation has a number of categories which
cannot be given a numerical value but we can measure the number of students belonging to each
specialisation.
Specialisation Number of students
Marketing 25
Finance 20
HR 5
Systems 5
Operations 5
Here specialisation is the attribute and number of students is the frequency (f).
Quantitative data can be further classified as discrete data and continuous data. In the case of
discrete data, the attribute takes only whole number values and every unit is a separate entity in
itself. For example, number of companies, number of colleges and number of families can take
only whole number values.
Number of colleges Number of students (f)
5 200
10 100
3 50
In the case of continuous data, the attribute can take decimal values apart from whole numbers
and every value within a particular range will be considered. Examples of these attributes are
marks, heights, incomes, weights, performance, productivity etc.
Marks Number of students (f)
45.5 5
60 6
75.4 10

Grouping of data: When any data is collected, it is in a scattered or ungrouped form. For
example, we can have marks of 10 students written one after the other. This is called as raw
data. This is shown by the following example:
12, 20, 34.5, 48, 27, 19, 28, 46, 11, 8
Here the lowest value is 8 and the highest value is 48. So we can find the range of the data as 48-
8= 40. We can convert this raw data into grouped data by classifying the data into certain
groups. The number of groups can be found by using the formula
Number of groups = 1+ 3.32 log10 N where N is the number of values.
Here we have N = 10, so number of groups = 1+3.32 log10 10 = 1+ 3.32 = 4.32
So we can have either 4 or 5 groups. If we decide to make 5 groups, then we can use tally marks
to find the frequency corresponding to each group.
Marks Number of students (f)
0-10 1 =1
10-20 111 = 3
20-30 111= 3
30-40 1= 1
40-50 11= 2
The above data is called as grouped data. Each group is called as a class or class interval. So
there are 5 class intervals. Each class is bounded by two values which are called as the lower
limit and the upper limit. The difference between these 2 limits is called as the width of the class.
So here, each class has a width of 10.
Class intervals can be of two types—exclusive and inclusive. In the exclusive type, we include
the lower limit but exclude the upper limit. For example, the first class 0-10 includes every value
from 0 to 9.9999… but 10 is excluded, which is included in the next interval 10-20. Again 20 is
excluded in the class 10-20. The classes shown in the above example are of the exclusive type.
The advantage of this type of class intervals is that continuity is maintained in the values and no
value gets missed out. Hence it is always better to use this type of class intervals.
In the inclusive type of class intervals, we include both the lower and the upper limits in the
same class. For example, 0-9, 10-19, 20-29, etc. are of the inclusive type. The limitation of this
type of intervals is that some values like 9.5, 19.4, 29.6 etc. will get missed out, if our data is
continuous. This type of intervals can be used if the data is discrete.
Hence the exclusive type of intervals is preferred as they can be used for both discrete and
continuous data.
We can always convert inclusive intervals into exclusive intervals in the following manner. First
take the difference between the upper limit of one class and the lower limit of the next class. We
divide this difference by two. The result obtained is subtracted from all the lower limits and
added to all the upper limits. For example, the inclusive intervals 0-9, 10-19, 20-29, 30-39, 40-
49 can be converted into exclusive intervals as -0.5-9.5, 9.5-19.5, 19.5-29.5, 29.5-39.5 and 39.5-
49.5. Then we can find the frequencies corresponding to each class.

From the point of view of collection of data, data can be categorised as primary data and
secondary data. Primary data is the first hand data collected directly from the source whereas
secondary data already exists in some written or published form like books, newspapers,
journals, reports and also the internet.

The important sources of primary data are:


(i) Survey conducted through (a) mailed questionnaires (b) Personal interviews (c) Telephonic
interviews
(ii) Observation

(iii) Experiments

(iv) Census

For any study or research, we first make use of secondary data to familiarise ourselves with the
topic and to understand the gaps in the study and then go for primary data using the methods
mentioned above.

Presentation of data: We have seen earlier how we present data in the form of frequency tables.
Data can also be presented in the form of certain diagrams.

(i) Qualitative data

We can take the example taken earlier of specialisation and number of students.
Specialisation Number of students
Marketing 25
Finance 20
HR 5
Systems 5
Operations 5
There are two simple diagrammatic tools used to present this data, namely bar charts and pie
charts.

(i) Bar Chart


Number of students
30
25
25
20
20

15

10
5 5 5
5

0
Marketing Finance HR Systems Operations

(ii) Pie chart

Number of students

5, 8% Marketing
5, 8%
Finance
5, 8% 25, 42%
HR
Systems

20, 34% Operations

(ii) Quantitative data


We can present quantitative data using a histogram, frequency polygon and a cumulative
percentage curve.
(i) Histogram
Marks Number of students
0-10 4
10-20 3
20-30 5
30-40 6

Histogram
8 150.00%
Frequency

6 100.00%
4
2 50.00% Frequency
0 0.00%
Cumulative %
0
10
20
30
40
50
More

Bin

The histogram can be used to find the mode graphically.


(ii) Frequency polygon
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
(iii) Cumulative Frequency curve or Ogive
Cumulative
Class Interval Frequency
20-under 30 6
30-under 40 24
40-under 50 35
50-under 60 46
60-under 70 49
70-under 80 50
60
40
Frequency

20
0

0 10 20 30 40 50 60 70 80
Years
The above frequency curve shown is called as a less than cumulative frequency curve or a less
than ogive.This is because we start from the lowest frequency 0 and go on adding the successive
frequencies to reach the last cumulative frequency.

Similarly, the other type of cumulative frequency curve ios called as the greater than cumulative
frequency curve or greater than ogive. Here we start with the last C.F. value and keep on
subtracting the respective frequencies to reach the last C.f. value of 0.

These two cumulative frequency curves meet at one point and this point is called as the
median.This is how we find the median graphically.

Q1. A researcher was analyzing the length of words in a poetry written by different poets. In one
such poem, the length of each word in the 90 words that the poet wrote was as given:

5 3 4 2 7 4 3 3 4 2 4 4 5 9 2

4 5 3 5 3 3 4 6 2 4 9 2 6 8 5

5 5 4 3 4 3 5 2 5 5 8 8 9 3 4

7 10 5 3 5 6 2 6 3 3 4 3 3 4 3

6 6 4 2 4 5 5 3 4 2 3 7 2 4 4

12 7 3 6 4 4 6 2 7 2 5 6 3 2 7

Question: Using the above data with minimum and maximum value as two ends, construct a
frequency distribution for the above data.

Solution: Minimum value = 2, Maximum value = 12

Class f C.f

2-5 52 52

5-8 30 82
8-11 7 89

11-14 1 90

Total 90

Q2. The Following are the daily wages of 40 workers in a factory:

26 24 16 10 16 23 28 23 25 18 10 11
20 21 19 18 15 13 22 17 15 29 29 12
34 15 14 18 22 24 30 38 17 32 36 20
19 27 33 31
i) Form a frequency distribution table taking 4 as the class interval.
ii) Find the percentage of Workers getting wages below Rs. 32.
Solution :

(i) Class No. of workers (f) C.f

10-14 5 5

14-18 8 13

18-22 8 21

22-26 7 28

26-30 5 33

30-34 4 37

34-38 2 39

38-42 1 40

(ii) Upto Rs 32, there are 33 + 2 = 35 workers. So percentage of workers = 35/40 x 100 = 87.5%
Levels of Data Measurement
• Nominal — Lowest level of measurement

• Ordinal

• Interval

• Ratio — Highest level of measurement

(i) Nominal Level Data


• Numbers are used to classify or categorize

Example: Employment Classification

– 1 for Educator

– 2 for Construction Worker

– 3 for Manufacturing Worker

Example: Ethnicity

– 1 for African-American

– 2 for Anglo-American

– 3 for Hispanic-American

(ii) Ordinal level data


• Numbers are used to indicate rank or order

– Relative magnitude of numbers is meaningful

– Differences between numbers are not comparable

Example: Ranking productivity of employees

Example: Taste test ranking of three brands of soft drink

Example: Position within an organization

– 1 for President
– 2 for Vice President

– 3 for Plant Manager

– 4 for Department Supervisor

– 5 for Employee

(iii) Interval Level Data


Distances between consecutive integers are equal

– Relative magnitude of numbers is meaningful

– Differences between numbers are comparable

– Location of origin, zero, is arbitrary

– Vertical intercept of unit of measure transform function is not zero

Example: Fahrenheit Temperature

Example: Calendar Time

Example: Monetary Utility


The Liker scale shown below is an example of interval scale.
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree
1 2 3 4 5

(iv) Ratio Level Data


• Highest level of measurement

– Relative magnitude of numbers is meaningful

– Differences between numbers are comparable

– Location of origin, zero, is absolute (natural)

– Vertical intercept of unit of measure transform function is zero

Examples: Height, Weight, and Volume

Example: Monetary Variables, such as Profit and Loss, Revenues, and Expenses
Example: Financial ratios, such as P/E Ratio, Inventory Turnover, and Quick Ratio.

Data Level, Operations, and Statistical Methods

You might also like