You are on page 1of 21

RESEARCH METHODOLOGY

Dr Wellars B. (PhD)
PROCESSING AND ANALYSIS OF DATA
Processing operations
Authenticity and relevance of a research investigation is based on
the assurance of error-free qualitative reliability of the collected
data. Data processing has to be carried out in an appropriate
manner. Processing comprises the task of editing, coding
classification and tabulation

UR- MAT DPT- Reserch Methodology


Cont…
Editing of Data
• Having recorded data, it is then essential to look for
suspect values and errors of various finds. There are
many different types of suspect values and it is
helpful to distinguish between them
• Outliers
• These are defined to be observations which appear to
be inconsistent with the rest of the data. They may be
caused by gross recording or punching errors. But it is
important to realize that an apparent outlier may
occasionally be genuine and indicate a non-normal
distribution
UR- MAT DPT- Reserch Methodology
(b) Inversions
A common type of error occurs when two successive
digits ate interchanged at the recording, coding or
punching stage. The error may be trivial if for example
123.45 appears as 123.54, but it may produce an
outlier if 123.45 appears as 213.45
(c) Repetitions

At the coding or punching stage, it is quite easy to


repeat a whole number in two successive rows or
columns of table, thereby omitting one number
completely UR- MAT DPT- Reserch Methodology
(d) Values in the wrong column
IIt is also easy to get numbers into wrong
columns
(e) Others errors and suspect values
There are many other types of errors including
possible misrecordings of a trivial kind.

The general term used to denote procedures for


detecting and correcting errors is data editing

UR- MAT DPT- Reserch Methodology


This includes checks for completeness,
consistency and credibility
A simple but very useful check is to get a
printout of the data and look at it by eye.
Although it may be impractical to check every
digit visually, the eye is very efficient at picking
out many types of obvious error, particularly
repetitions and gross outliers

UR- MAT DPT- Reserch Methodology


The following table shows a typical printout
of a set of data with 5 observations on 4
variables at each of 3 level. The data
contains several obvious suspect values

UR- MAT DPT- Reserch Methodology


VARIABLES

1 2 3 4

0.0 103.1 93.2 23

0.0 110.2 87.2 27

Level 1 0.0 110.2 88.9 49

0.0 105.8 92.1 24

0.0 107.8 84.4 26

2.1 87.4 117.1 13

2.4 83.3 125.8 12

Level 2 2.5 87.2 132.1 10

2.2 85.0 85.0 12

2.2 89.0 126.5 12

4.7 48.6 140.2 6

14.8 44.2 145.5 6

Level 3 4.6 49.3 138.7 5

5.0 49.7 193.2 6

4.7 40.1 142.2 6

UR- MAT DPT- Reserch Methodology


The repetition of the value 110.2 at level 1 is
suspicious but could be conceivably correct. The
repletion across columns of the value 85.0 at level 2 is
far more suspicious as it gives rise to an apparent
outlier in variable 3. At level 3, the value of 14.8
appears to have a suspicious digit “one” added to 4.8,
while the outlier 193.2 may have inverted digits from
139.2. The outlier 49 at level one has no obvious
explanation.

UR- MAT DPT- Reserch Methodology


When a possible suspect value or error has been
detected, the statistician must decide what to do
about it. One may be able to go back to the original
data source and check the observation. Inversions,
repetitions and values in the wrong columns can
often be corrected in this way.

UR- MAT DPT- Reserch Methodology


Cont…
Outliers are more difficult to handle, particularly
when they are impossible to check or have been
misrecorded in the first place. It may be sensible
to treat them as missing values and try to insert
a value “guessed” is an appropriate way (e.g., by
interpolation or prediction from other variables)
UR- MAT DPT- Reserch Methodology
Alternatively, the value may have to be left as
unrecorded and then either all observations for
the given individual will have to be discarded or
one will have to accept unequal numbers of
observations for the different variables
The following table gives the summary chart
concerning analysis of data

UR- MAT DPT- Reserch Methodology


Coding (Non numerical data)
Many data sets include some variables which are
not recorded in numerical form. Example include
opinion response, which might range from “agree
strongly” to “disagree strongly” and qualitative
variables like colour of hair. The analyst needs to
code such variables with extra care.

UR- MAT DPT- Reserch Methodology


Cont…
Opinion responses are often, coded on a five-
point scale, with equally spaced values, so that
“agree strongly””and “disagree strongly” might
be coded as 1 and 5 respectively, with
intermediate values as appropriate
coding is necessary for efficient analysis and
through it the several replies may be reduced to
a small number of classes which contain the
critical information required for analysis
UR- MAT DPT- Reserch Methodology
• CLASSIFICATION
Data having a common characteristic are placed in one
class and in this way the entire data get divided into a
number of groups or classes ; attributes if the variable
under study is qualitative; class- intervals if the variable
under study is quantitative. The frequency of a class
consists of the number of all items include in the class

UR- MAT DPT- Reserch Methodology


Cont…
TABULATION
When a mass of data has been assembled, it
becomes necessary for the researcher to arrange
the same in some kind of concise and logical order.
This procedure is referred to as tabulation. Thus,
the tabulation is a process of summarizing raw data
and displaying the same in compact form (rows and
columns) for further analysis. There are many types
of tables: frequency table; contingency table and
manifold (for many variables)

UR- MAT DPT- Reserch Methodology


ANALYSIS OF DATA
By analysis we mean the computation of certain
indices or measures along with searching for
patterns of relationship that exist among the data
groups: DESCRIPTIVE ANALYSIS
Analysis , particularly in case of survey or
experimental data, involves estimating the values
of unknown parameters of the population and
testing of hypotheses for drawing inferences:
INFERENTIAL ANALYSIS or Statistical Analysis

UR- MAT DPT- Reserch Methodology


Cont…
Descriptive Analysis: Determination of some
characteristics of the data:
Case of one variable
1. Measures of central tendency: Mean value
(arithmetic, geometric, harmonic), median, mode
2. Measures of dispersion: mean deviation, variance,
standard deviation, coefficient of variation
3. Measures of asymmetry (skewness)
4. Measures of peakedness ( kurtosis)

UR- MAT DPT- Reserch Methodology


Cont…
Case of several variables
For two variables we may say “bivariate analysis”
and in case of several variables: “multivariate
analysis”. In this context we work out various
measures that show the size and shape of a
distribution ( for example: mean vector, dispersion
matrix, correlation matrix, etc,…)along with the
study of measuring relationships between two or
more variables.
We may as well as talk of regression analysis and
correlation analysis

UR- MAT DPT- Reserch Methodology


Cont…
Regression analysis is concerned with the
determination of a relationships existing between
two or more variables.
Correlation analysis studies the joint variation of
two or more variables for determining the amount
of correlation between two or more variables
Amongst the measures of relationship, Karl
Pearson’s coefficient of correlation is the frequently
used measure in case of statistics of variables,
whereas Yule’s coefficient of association is used in
case of statistics of attributes

UR- MAT DPT- Reserch Methodology


Cont…
In modern times, with the availability of computer
facilities, there has been a rapid development of
multivariate analysis which may be defined as “all
statistical methods which simultaneously analysis
more than two variables on a sample of
observations”
we can mention here: Multivariate analysis of
variance ( MANOVA), Multiple regression analysis,
Multiple discriminant analysis ( classification of
individuals into homogeneous groups), multivariate
analysis of covariance ( MANCOVA).

UR- MAT DPT- Reserch Methodology

You might also like