You are on page 1of 6

Department of Marketing Management (SMU)

CHAPTER SIX
DATA PROCESSING AND ANALYSIS

Overview of Data Processing and Analysis


The goal of any research is to provide information out of raw data. The raw data after collection
has to be processed and analyzed in line with the outline (plan) laid down for the purpose at the
time of developing the research plan. The compiled data must be classified, processed, analyzed
and interpreted carefully before their complete meanings and implications can be understood.
6.1.Data Processing
Data processing implies editing, coding, classification and tabulation of collected data so that they
are amenable to analysis.
1. Editing:
Is a process of examining the collected raw data to detect errors and omission (extreme values)
and to correct those when possible?
 It involves a careful scrutiny of completed questionnaires or schedules
 It is done to assure that the data are:
 Accurate
 Consistent with other data gathered
 Uniformly entered
 As complete as possible
 And has been well organized to facilitate coding and tabulation.
Editing can be either field editing or central editing
 Field editing: consist of reviewing of the reporting forms by the investigator for
completing what has been written in abbreviation and/ or in illegible form at a time of
recording the respondent’s response. This sort of editing should be done as soon as
possible after the interview or observation.
 Central Editing: it will take place at the research office. Its objective is to correct errors
such as entry in the wrong place, entry recorded in month.
2. Coding:
Refers to the process of assigning numerical or other symbols to answers so that responses can be
put in to a limited number of categories or classes. Such classes should be appropriate to the
research problem under consideration. There must be a class of every data items. They must be
mutually exclusive (a specific answer can be placed in one and only one cell in a given category
set).
3. Classification
Most research studies in a large volume of raw data, which must be reduced in to homogenous
group. Which means to classify the raw data or arranging data in groups or classes on the basis of
common characteristics. Data classification implies the processes of arranging data in groups or
classes on the basis of common characteristics. Data having common characteristics placed in one
class and in this way the entire data get divided in to a number of groups or classes.

4. Data according to attributes:


Compiled by: Wondwosen T. & Yalwe G……………………Marketing Research 1
Department of Marketing Management (SMU)

Data are classified on the basis of common characteristics, which can either be descriptive (such as
literacy, sex, honesty, etc) or numerical (such as weight, age, height, income, expenditure, etc).
Descriptive characteristics refer to qualitative phenomenon, which cannot be measured
quantitatively: only their presence or absence in an individual item can be noticed. Data obtained
this way on the basis of certain attributes are known as statistics of attributes and their
classification is said to be classification according to attributes.

5. Classification according to Class interval:


Unlike descriptive characteristics the numerical characteristics refer to quantitative phenomena,
which can be measured through some statistical unit. Data relating to income, production, age,
weighted, come under category. Such data are known as statistics of variables and are classified
on the basis of class interval. For example, individuals whose incomes, say, are within 1001-1500
birr can form one group, those whose incomes within 500-1500 birr can form another group and so
on. In this way the entire data may be divided in to a number of groups or classes or what are
usually called class interval. Each class-interval, thus, has an upper as well as lower limit, which is
known as class limit. The difference between the two-class limits is known as class magnitude.
The number of items that fall in a given class is known as the frequency of the given class.

6.2.Data Analysis
Data analysis is further transformation of the processed data to look for patterns and relations
among data groups. By analysis we mean the computation of certain indices or measures along
with searching for patterns or relationship that exist among the data groups. Analysis particularly
in case of survey or experimental data involves estimating the values of unknown parameters of
the population and testing of hypothesis for drawing inferences. Analysis can be categorized as:
 Descriptive Analysis
 Inferential (Statistical) Analysis
6.2.1 Descriptive Analysis
Descriptive analysis is largely the study of distribution of one variable. Analysis begins for most
projects with some form of descriptive analysis to reduce the data in to a summary format.
Descriptive analysis refers to the transformation of raw data in to a form that will make them easy
to understand and interpret.
The most common forms of describing the processed data are:
1. Tabulation 4. Measures of dispersion
2. Percentage 5. Measures of asymmetry
3. Measurers of central tendency
1. Tabulation
Tabulation refers to the orderly arrangements of data in a table or other summary format. It
presents responses or the observations on a question-by-question or item-by item basis and
provides the most basic form of information. It tells the researcher how frequently each response
occurs.
Need for Tabulation
 It conserves space and reduces explanatory and descriptive statement to a minimum

Compiled by: Wondwosen T. & Yalwe G……………………Marketing Research 2


Department of Marketing Management (SMU)

 It facilitates the process of comparison


 It facilitates the summation of items and the detection of errors and omission
 It provides basis for various statistical computation
2. Percentage:
Percentage: whether the data are tabulated by computer or by hand, it is useful to have
percentages and cumulative percentage. Table containing percentage and frequency distribution
is easier to interpret. Percentages are useful for comparing the trend over time or among
categories.
3. Measures of Central Tendency
Measures of Central Tendency: describing the central tendency of the distribution with mean,
median, or mode is another basic form of descriptive analysis.
These measures are most useful when the purpose is to identify typical values of a variable or the
most common characteristics of a group. Measures of central tendency are also known as
statistical average. Mean, median, and mode are most popular averages.

 The most commonly used measure of central tendency is the mean. To compute the mean, you
add up all the numbers and divide by how many numbers there are. It's not the average nor a
halfway point, but a kind of center that balances high numbers with low numbers.  For this
reason, it's most often reported along with some simple measure of dispersion, such as the
range, which is expressed as the lowest and highest number.

 The median is the number that falls in the middle of a range of numbers. It's not the average;
it's the halfway point. There are always just as many numbers above the median as below it. In
cases where there is an even set of numbers, you average the two middle numbers. The median
is best suited for data that are ordinal, or ranked. It is also useful when you have extremely low
or high scores.

 The mode is the most frequently occurring number in a list of numbers. It's the closest thing to
what people mean when they say something is average or typical. The mode doesn't even have
to be a number. It will be a category when the data are nominal or qualitative. The mode is
useful when you have a highly skewed set of numbers, mostly low or mostly high. You can
also have two modes (bimodal distribution) when one group of scores are mostly low and the
other group is mostly high, with few in the middle.

4. Measure of Dispersion:
Measure of Dispersion: is a measurement how the value of an item scattered around the truth-
value of the average. Average value fails to give any idea bout the dispersion of the values of an
item or a variable around the truth-value of the average. After identifying the typical value of a
variable the researcher can measure how the value of an item is scattered around the true value of
the mean. It is a measurement of how far is the value of the variable from the average value. It
measures the variation of the value of an item.
Important measures of dispersion are:

Compiled by: Wondwosen T. & Yalwe G……………………Marketing Research 3


Department of Marketing Management (SMU)

 Range: measures the difference between the maximum and the minimum value of the
observed variable.
 Mean Deviation: it is the average dispersion of an observation around the mean value. E
(Xi-X)/n
 Variance: it is mean square deviation. It measures the sample variability.
 Standard deviation: the square root of variance

5. Measure of asymmetry (Skew-ness):


Measure of asymmetry (skew-ness): when the distribution of items is happen to be perfectly
symmetrical, we then have a normal curve and the relating distribution is normal distribution.
Such curve is perfectly bell shaped curve in which case the value of Mean= Median= Mode
Skew ness is, thus a measurement of asymmetry and shows the manner in which the items are
clustered around the average. In a symmetric (normal distribution) the items show a perfect
balance on either side of the mode, but in a skewed distribution the balance is skewed one side or
distorted. The amount by which the balance exceeds on one-side measures the skew-ness.
Knowledge about the shape of the distribution is crucial to the use of statistical measure in
research analysis. Since most methods make specific assumption about the nature of distribution.
Skew -ness describes the asymmetry of a distribution. A skewed distribution therefore has one tail
longer than the other.
 A positively skewed distribution has a longer tail to the right

 A negatively skewed distribution has a longer tail to the left

 A distribution with no skew (e.g. a normal distribution) is symmetrical

6.2.2. Inferential Analysis


Most researcher wishes to go beyond the simple tabulation of frequency distribution and
calculation averages and/or dispersion. They frequently conduct and seek to determine the
relationship between variables and test statistical significance.
When the population is consisting of more than one variable it is possible to measure the
relationship between them. Is there any association or correlation between the two or more
variable? If yes, then up to what degree?
This will be answered by the use of correlation technique.

1. Correlation
The most commonly used relational statistic is correlation and it's a measure of the strength of
some relationship between two variables, not causality. Interpretation of a correlation coefficient
does not even allow the slightest hint of causality. The most a researcher can say is that the
variables share something in common; that is, are related in some way. The more two things have
something in common, the more strongly they are related. There can also be negative relations, but
the important quality of correlation coefficients is not their sign, but their absolute value. A
correlation of -.58 is stronger than a correlation of .43, even though with the former, the

Compiled by: Wondwosen T. & Yalwe G……………………Marketing Research 4


Department of Marketing Management (SMU)

relationship is negative. The following table lists the interpretations for various correlation
coefficients:
.8 to 1.0 Very strong
.6 to .8 Strong
.4 to .6 Moderate
.2 to .4 Weak
.0 to .2 Very weak

Pearson's correlation coefficient, or small r, represents the degree of linear association between any
two variables. Unlike regression, correlation doesn't care which variable is the independent one or
the dependent one, therefore, you cannot infer causality. Correlations are also dimension-free, but
they require a good deal of variability or randomness in your outcome measures. A correlation
coefficient always ranges from negative one (-1) to one (1), so a negative correlation coefficient of
-0.65 indicates that "65% of the time, when one variable is low, the other variable is high" and it's
up to you, the researcher to guess which one is usually initially low. A positive correlation
coefficient of 0.65 indicates, "65% of the time, when one variable exerts a positive influence, the
other variable also exerts a positive influence". Researchers often report the names of the variables
in such sentences, rather than just saying "one variable". A correlation coefficient at zero, or close
to zero, indicates no linear relationship.

The most frequently used correlation coefficient in data analysis is the Pearson product moment
correlation. It is symbolized by the small letter r, and is fairly easy to compute from raw scores
using the following formula:

II. Is there any cause and effect (causal relationship) between two variables or between one
variable on one side and two or more variables on the other side?
This question can be answered by the use of regression analysis. In regression analysis the
researcher tries to estimate or predict the average value of one variable on the basis of the value of
other variable.

2. Regression

Compiled by: Wondwosen T. & Yalwe G……………………Marketing Research 5


Department of Marketing Management (SMU)

Regression is the closest thing to estimating causality in data analysis, and that's because it
predicts how much the numbers "fit" a projected straight line. The most common form of
regression, however, is linear regression, and the least squares method to find an equation that
best fits a line representing what is called the regression of y on x. Instead of finding the perfect
number, however, one is interested in finding the perfect line, such that there is one and only one
line (represented by equation) that perfectly represents, or fits the data, regardless of how
scattered the data points. The slope of the line (equation) provides information about predicted
directionality, and the estimated coefficients (or beta weights) for x and y (independent and
dependent variables) indicates the power of the relationship.

Yi= Bo + B1Zi
Yi= Outcomes score for the nth unit (dependent variable)
B0= coefficient for the intercept
B1= Coefficient for slope
Zi= independent variable

Compiled by: Wondwosen T. & Yalwe G……………………Marketing Research 6

You might also like