You are on page 1of 17

CLASSIFICATION OF DATA

WHY?
Data obtained from the primary instrument is bulky and voluminous
which become tedious to interpret.

WHAT?
Reducing the information into homogeneous categories on the basis
of structured questions.

HOW?
● On the basis of attributes
● On the basis of class intervals
➢ ON THE BASIS OF ATTRIBUTES
Person score on a particular variable is computed by various combinations
of the original data obtained. This process is called variable respecification.

FOR EXAMPLE: A study on school children mental growth was calculated


on the basis of their answers given to the questions related to the
conceptual knowledge and applications.

Socio-economic classification of a person could be identified on the basis of


education and occupation.
❏ Respecification can also be done by collapsing the response categories
and use of square root and log transformations.

FOR EXAMPLE: Suppose the original variable was plastic bag usage with
10 response categories. These might be collapsed into 4 categories: heavy,
medium, light and non user.

❏ Dummy variables can also be used for respecifying categorial variables.


These variables are also called binary, dichotomous, instrumental or
qualitative variables. They take only two values such as 0 or 1.
ON THE BASIS OF CLASS INTERVALS
Numerical data, like the ratio scale data, can be classified into class
intervals to assist the quantitative analysis of data.

FOR EXAMPLE: The age data obtained from the sample could be reduced
to homogenous grouped data. All those below 25 form one group, those 25-
35 are another group and so on.

● Each group will have class limit- an upper limit and lower limit
● The difference between the limit is termed as the class magnitude.
● Class interval can be of equal and unequal magnitude.
● Class interval can be exclusive or inclusive.
Formula adopted to determine the number of class intervals :

i = R/(1+3.3log N)

i = size of class interval

R = Range( difference between the value of largest and smallest item)

Exclusive class intervals Inclusive class interval

(upper limit is excluded) (includes both the limits)

10-15 10-15

15-20 16-20

20-25 21-25
TABULATION OF DATA
❖ Orderly arrangement of data into an array that is suitable for statistical
analysis.
❖ It is an arrangement of rows and columns.
❖ It can be done manually or with the help of computer.
❖ When data to be entered for one variable, the process is a simple
tabulation.
❖ When there are two or more variables, cross tabulation of data is
carried out.
EXPLORATORY DATA ANALYSIS
Preliminary data exploration is done to assess the expected trends of the
findings. These indicative trends may demonstrate that the data collection
or instrument design is faulty and need some corrections.

This is loosely structured exploration before the testing of formulated


hypothesis.

It is done on the basis of graphical and visual display of data patterns that
seem to be emerging.
WIDELY USED MEASURES OF DISPLAYING DATA
BAR AND PIE CHARTS: The data that is available as classification or
demographic variable is most often on a categorical or nominal scale.The
tabled data can be plotted to demonstrate the pattern of responses.

FOR EXAMPLE: In a study on jewellery buying the age groups of the


sample group and the occupations were as follows:
Visual representation of the largest and smallest group through pei chart.
Bar charts for getting a comparative depiction
HISTOGRAM: For metric-interval and ratio scale data, It demonstrates the
distribution pattern in terms of whether it is normally distributed or skewed.
FOR EXAMPLE: The result of the distribution of 15 customers who
purchased from branded jewellery outlets last year.
As, most of the sample did a

purchase of an item that

weighted less than 20g.

Therefore, sample selected is

more skewed towards the

Purchaser of smaller items.


STEM AND LEAF DISPLAYS: It shows individual data values in each set
as against the histogram which presents only group aggregates.It shows the
pattern of responses in each interval and yet can maintain the rank order for
a quick approximation of the median or quartile.

Each row or line is called a stem and each value on the line is leaf. FOR
EXAMPLE:The data for jewellery purchase in the stem and leaf display is:

Display is showing that sample study

Was concerned with the buying of

Mostly 13g items.


STATISTICAL SOFTWARE PACKAGES

Statistical software packages assist the researchers in both data management


and data analysis.

Some of the most frequently used packages are:

● MS EXCEL
● MINITAB
● SYSTEM FOR STATISTICAL ANALYSIS
● SPSS( Statistical package for the social sciences)
❏ MS EXCEL: It is the most widely used method for presenting and
tabulating data.

It is easy to understand as the basic mathematical functions can be


calculated. The data entered on excel can be transported to most statistical
packages for a higher level analysis.

❏ MINITAB: Minitab Inc. was developed more than 20 years ago at the
pennsylvania state university. It can be used with considerable ease and
effectiveness in business areas.

It is used for multiple applications- quality control, six sigma and the design
of experiments. The URL for minitab is http://www.minitab.com/.
❏ System for statistical analysis(SAS): SAS was created in late 1960s at
north carolina state university.

Models possible with SAS are: Linear models, Generalised linear models,
multivariate models, categorical data analysis.

All the statistical techniques for descriptive and confirmatory statistical


analysis are possible with SAS. Forecasting and trend series can also be
carried out by SAS. The URL for the package is http://www.sas.com/.

❏ SPSS (statistical package for social sciences) : It is the most widely used
package among student community. It is adaptable to most business
problem and user friendly. The URL for SPSS is http://www.spss.com/.
REFERENCES
Zikmund, WG., Babin, B.J.,Carr, J.C., Adhikari, A. & Griffin, M.(2013).

You might also like