You are on page 1of 20

1) Stage of preparing data preparation

2) Data Analysis
3) Descriptive statistics

1) Nishan Navaratne
 Converting information from questionnaire so it
Validation E can be transferred to a data warehouse is
R
R
referred to as data preparation
DAT O
Editing & R
A
Coding
D
 This process usually follows a four step
PRE
PAR E approach, beginning with data validation
T
ATI Data Entry
E
followed by editing and coding, data entry and
ON
C data tabulation
T
I
Data Tabulation O
N  Error detection begins in first phase and
continues throughout the process
Data Analysis

 The purpose of data preparation is to take data


Descriptive
Uni &
Bivariate
in its raw form and convert it to establish
Analysis Analysis meaning and create value for the user
MultiVariat
e Analysis

Interpretation
Curbstoning
 The process of determining, to the extent
possible, whether a surveys interviews or
It is term observations were conducted correctly and are
used in free of fraud or bias
marketing
research  In many data collection approaches it is not
always convenient to closely monitor data
industry to collection process wherein to facilitate the
indicate accurate data collection each respondents
falsification name, address and phone number may be
recorded
of data which
is collected  While this information is not used for analysis,
like filling it does enable the validation process to be
the completed
questionnaire
by self
 Process of data validation covers five areas :
Data
Validation 1. FRAUD : To infer that whether
 Person was actually interviewed or not
areas :  Did the interviewer contact respondent simply to
get a name/address and then proceed to fabricate
responses?
1)Fraud  Did the interviewer used the friend to obtain the
2)Screening necessary information?
SCREENING : To ensure accuracy of data
3)Procedure collected in set prescribed criteria such Household
4)Completen income level, recent purchase of a specific product
and brand or even gender or age. Like
ess  Interview procedure may require that only
female heads of households with an annual
5)Courtesy household income of Rs 25000 or more be
interviewed. In this case validation callback
would verify each of these factors
Data  Process of data validation covers five areas :
Validation
areas :  PROCEDURE: In marketing research, it is critical
that the data be collected according to a specific
procedure. Like
1)Fraud
2)Screening  Many customer exit interviews must occur in
a designated place as the respondent leaves a
3)Procedure certain retail establishment. Here a validation
callback may be necessary to ensure that
4)Completen interview took place at the proper setting, not
ess some social gathering area like a party or a
park
5)Courtesy
 Process of data validation covers five areas :
Data
Validation  PROCEDURE: In marketing research, it is critical that the
areas : data be collected according to a specific procedure. Like

 Many customer exit interviews must occur in a


1)Fraud designated place as the respondent leaves a certain
2)Screening retail establishment. Here a validation callback may be
necessary to ensure that interview took place at the
3)Procedure proper setting, not some social gathering area like a
4)Completen party or a park

ess
5)Courtesy
 Process of data validation covers five areas :
Data
Validation  COMPLETENESS: In order to speed through the data
collection process , an interviewer may ask the
areas : respondent only a few of requisite questions and then
make up answers to remaining questions
1)Fraud
2)Screening  To determine if the interview is valid , researcher could
recontact a sample of respondents and ask about
3)Procedure questions from different parts of interview form
4)Completen
ess
5)Courtesy
 Process whereby data must be edited for mistakes
Data wherein raw data is checked for mistakes made by either
interviewer or respondent is called as data editing
Validation
areas :  By scanning each completed interview , the researcher
can check following areas of concern :
1)Fraud  Asking the proper questions
 Accurate recording of answers
2)Screening  Correct screening questions
3)Procedure  Responses to open ended questions
4)Completene
ss
5)Courtesy
 Grouping and assigning value to various responses from the survey instrument

 Codes are typically numerical number from 0 to 9 because numbers are quick
and easy to input and computers work better with numbers than alphanumerical
values

 It can be tedious if certain issues are not addressed prior to collecting the data

 Like - - well planned and constructed questionnaire can reduce the amount of
time spent on coding and increase the accuracy of the process if it is
incorporated into design of questionnaire
 In questionnaires that do not use such simple coded responses, the
researcher will establish a master code on which the assigned numeric
values are shown

 Researchers typically use a four step process to develop codes for


responses :
1. Generating list of as many potential responses as possible and
Assigning values to generated responses
2. Consolidation of responses is actually the second phase of the four
step process – having same meaning clubbed to one
3. Assign a numerical value as code
4. Assign a coded value to each response
 Those task involved with the direct input of the coded data into some specified software
package that ultimately allows the research analyst to manipulate and transform the raw data
into useful information

 It follows validation, editing and coding

 It is the procedure used to enter the data into the computer for subsequent data analysis

 It includes those tasks involved with the direct input of the coded data into a software
package that enables the research analyst to manipulate and transform the raw data into useful
information

 One critical task of data entry personnel is to ensure that the data entered is correct and error
free
 First step in error detection is to determine whether the software used for data entry and
tabulation will allow the researcher to perform “error edit routines” which identifies the wrong
type of data. Example – Say that for a particular field on a given data record, only the codes of 1
or 2 should appear. An error edit routine can display an error message on the data output if any
number other than 1 or 2 has been entered

Another approach to error detection is for the researcher to review a printed representation of
entered data

The final approach to error detection is to produce a data/column list for the entered data.
Quick view of this data/column list procedure can indicate to the analyst whether inappropriate
codes were entered into data fields
Once the data have been collected and prepared for analysis, there are some basic
statistical analysis procedures that MR will want to perform

An obvious need for these statistics comes from the fact that almost all data sets
are disaggregated

Graphics should be used whenever practical availing information user to quickly


grasp the essence of the information developed in research project

Charts also can be an effective visual aid to enhance the communication process
and add clarity and impact to research reports i.e Bar Charts, Line charts, pie or
round chart

 Data must be accurately scored and systematically organized
to facilitate data analysis vide descriptive analysis, univariate
,bivariate analysis and multivariate analysis

Descriptive statistics : permit the researcher to describe many


pieces of data with a few indices

Statistics : indices calculated by the researcher for a sample


drawn from a population

Parameter : indices calculated by the researcher for an entire


population
Types of descriptive statistics :
1) Graphs
2) Measures of Central Tendency
3) Measures of central variability

Graphs :
a.Representations of data enabling the researcher to see
what the distribution of scores look like Bar graph, line
graph and Pie or Round chart
Indices enabling the researcher to determine the typical or
average score of a group of scores.

They are :

a)Mean –
 The arithmetic average of the sample
 All values of a distribution of responses are
summed and divided by the number of valid
responses
b) Median –
 The middle value of rank ordered distribution
 Exactly half of the responses are above and half are below
the median value
3) Mode –
The most common value in the set of responses to a
question i.e the response most often given to a question
Indices enabling the researcher to indicate how spread
out a group of scores are
They are :

a)Range
b)Quartile deviation
c) Variance
d)Standard Deviation
a) Range - The difference between the highest and lowest score in
a distribution
b) Variance –
 A summary statistic indicating the degree of variability
among participants for a given variable
 The average squared deviation about the mean of
distribution of values
c) Standard deviation –
 The square root of variance providing an index of
variability in the distribution of scores.
 It describes the average distance of distribution values
from the mean

You might also like