You are on page 1of 19

Chapter 3

Processing of Business Data


Data classification

• The method of arranging data into homogeneous classes according to some common
features present in the data is called data classification.

• It’s the process of organizing data into categories for its most effective and efficient use.

• A planned data analysis system makes fundamental data easy to find and recover.  This
can be of particular importance for risk management, legal discovery, and compliance.
Types of Data
Classification
Qualitative data are classified on the basis of certain descriptive character or qualitative aspect of a
phenomenon viz. sex, beauty, literacy, honesty, intelligence, religion, eye-sight etc. Population can be
divided on the basis of marital status as married or unmarried etc.

Nominal  data can only be classified, while ordinal data can be classified and ordered.


A variable that has two or more categories, without any implied ordering.

Examples : 
Gender - Male, Female
Marital Status - Unmarried, Married, Divorcee
State - New Delhi, Haryana, Illinois, Michigan
A variable that has two or more categories, without any implied ordering.

Ordinal Variable (Ordered list)


A variable that has two or more categories, with clear ordering.

Examples : 
Scale - Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree
Rating - Very low, Low, Medium, Great, Very great
Quantitative data refers to variables of quantities which can be either measured or
operated on. A quantitative variable can be counted, measured and/or operated
with; it provides specific information on a numerical scale.

There are two types of quantitative data: discrete and continuous.


Discrete quantitative data refers to variables that can be counted and have a finite
amount fixed.
Continuous quantitative data is that which is measured and can have any value
(even within a defined range).

For example, if you are to count the amount of people having dinner at a
restaurant, this would be discrete data, first, because you are counting; second,
you cannot have fractions of people, you can only have complete people. Discrete
data comes in the form of whole numbers or integers.
On the other hand, if you measure the time it takes for each table in the restaurant to
receive what they ordered (hopefully within the range of an hour) you will have values
containing hours, minutes, second, and even fractions of a second if you want to increase
precision! And so, these values would be a set of continuous quantitative data, first,
because you measured them; second, because you can have any value (any value
containing decimals, not just integers) within the reasonable range.

Having learnt this, we have a short list of quantitative examples of data:


The amount of students in a classroom.
The total amount of photographs saved on a memory card.
The temperature on each day in spring.
The weight of each person on a train.

Notice that from the four examples of quantitative variables listed above, the first two
are examples of discrete variables, while the third and fourth are examples of continuous
variables.
Continuous variables can be further categorized as either interval or ratio variables.
Interval variables are variables for which their central characteristic is that they can be
measured along a continuum and they have a numerical value (for example, temperature
measured in degrees Celsius or Fahrenheit). So the difference between 20°C and 30°C is
the same as 30°C to 40°C. However, temperature measured in degrees Celsius or
Fahrenheit is NOT a ratio variable.

Ratio variables are interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature measured in
degrees Celsius or Fahrenheit is not a ratio variable because 0°C does not mean there is
no temperature. However, temperature measured in Kelvin is a ratio variable as 0 Kelvin
(often called absolute zero) indicates that there is no temperature whatsoever. Other
examples of ratio variables include height, mass, distance and many more. The name
"ratio" reflects the fact that you can use the ratio of measurements. So, for example, a
distance of ten metres is twice the distance of 5 metres.
Example 1
Determine which of the following data is quantitative or qualitative:
The marks that students get in a test.
The genders of newborn babies.
The area codes in phone numbers.
The heights of buildings.

Example 2
Identify which items in the BELOW list are discrete and which are continuous

The number of customers visiting a store over a weekend.


The amount of water consumed by a country over the past 10 years.
The outcomes of rolling a 6-sided die ten times.
The heights of trees in a rainforest.
Students' shoe sizes in a class.
Data Array

An array is a systematic arrangement of objects,


usually in rows and columns. Suppose an
organization collects some data through survey. Then
they decide to arrange the data into a chronological
order. Array is the way to give the data a new
arranged shape. It can be categorized depending on
the data type.
Statistical
Variables

A variable is defined as an attribute of an object of study. Choosing which


variables to measure is central to good experimental design.
Types of
variables
 Categorical variable/Qualitative variable: Nominal variable AND Ordinal variable
 Quantitative variable: Discrete and Continuous variable. Continuous variables can
be further categorized as either interval or ratio variables.

Confounding variable : Confounding variables, which are also called confounders or


confounding factors, are closely related to a study’s independent and dependent variables.
A variable must meet two conditions to be a confounder:
It must be correlated with the independent variable. This may be a causal relationship, but it
does not have to be.
It must be causally related to the dependent variable.

 Control variable: A control variable in scientific experimentation is an experimental element


which is constant and unchanged throughout the course of the investigation.
 Dependent variable and Independent variable: an independent variable is
Find the range of the data.

Decide the approximate number of classes 

Attributes
construction of
Determine the approximate class interval size
Frequency
Distribution Decide the starting point

Determine the remaining class limits (boundary)

Distribute the data into respective classes


Frequency
Distribution

To understand frequency distribution, let us first start with a simple example.


We consider the marks obtained by ten students from a class in a test to be
given as follows:

23, 26, 11, 18, 09, 21, 23, 30, 22, 11

This form of data is known as raw data. A statistical measure


called range can be defined. It is the difference between the largest and
smallest values of a data set. Here, range = 30 – 09 = 21.
Ungrouped
Data Marks obtained in No. of students
the test (Frequency)
Let the test scores of all 20 students be as
09 1
follows:
23, 26, 11, 18, 09, 21, 23, 30, 22, 11, 21, 20, 11 4
11, 13, 23, 11, 29, 25, 26, 26
13 1

18 1

20 1

21 2

22 1
Absolute, relative, cumulative
frequency
The absolute frequency is the number of times a particular
value (or particular set of values) of a variable is observed.
The distribution or table of frequencies is a table
of the statistical data with its corresponding
frequencies.
Twenty students were asked how many hours they
worked per day. Their responses, in hours, are
listed below:
5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3
Below is a frequency table listing the different
data values in ascending order and their
frequencies.
Absolute, relative, cumulative
frequency(cont..)
A relative frequency is the fraction of times an answer occurs. To find the relative
frequencies, divide each frequency by the total number of students in the sample - in this
case, 20. Relative frequencies can be written as fractions, percent's, or decimals.
Absolute, relative, cumulative
frequency(cont..)
Cumulative relative frequency is the accumulation of the previous relative frequencies.
To find the cumulative relative frequencies, add all the previous relative frequencies to
the relative frequency for the current row.
Types of table

Tables can be classified according to their purpose, stage of enquiry,


nature of data or number of characteristics used. On the basis of the
number of characteristics, tables may be classified as follows:
Hat Color
Red Blue Yellow
 Simple or one-way Table: Choices
 Two-way Table 5 3 2

Leisure
Dance Sports TV Total
Activity
Men 2 10 8 20
Women 16 6 8 30
Total 18 16 16 50
Construction of a table from data Statement
The making of a compact table itself an art. What the purpose of tabulation is and
how the tabulated information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should consist of the following
main parts:

Table Number
Title
Captions or column Headings
Stubs or Row Designations
Body
Footnotes
Sources of data

You might also like