You are on page 1of 32

PROCESSING DATA

Steps in data processing


Sources of Raw Data
 Interviews
 Questionnaires
 Observation
 Focus groups
 Experiments
 Secondary data (Sometimes not raw)
Preparing Data for Analysis
If you intend to undertake quantitative analysis we
recommend you consider:
 The type of data (level of numerical measurement)

 The format in which your data will be input to the

analytical software
 The impact of data coding on subsequent

analysis.
 The need to weight cases

 The method you intend to use to check data for

errors
Questions of Importance
 The following questions should be asked:
 How do you find answers to your research
questions?
 How do you prove or disprove your
hypothesis if you had one?
 How do you make sense of the information
collected?
 How should the information be analysed to
achieve the objectives of your study?
 Irrespective of the method of data collection,
the information collected is called raw data or
simply data
 The first step in processing your data is to

ensure that data are clean ie free from


inconsistencies and incompleteness.
 This process of cleaning is called editing.
Editing Data
 Editing consists of scrutinising the completed
research instrument to identify and minimise,
as far as possible, errors, incompleteness,
misclassification and gaps in the information
obtained from the respondents.
 Sometimes even the best investigators can:
 Forget to ask a question
 Forget to record a response
 Wrongly classify a response
 Write only half a response
 Write illegibly
 In the case of a questionnaire, similar
problems can crop up.
 These problems to a great extent can be

reduced simply by:


 Checking the contents for completeness
 Checking the responses for internal

consistency
 There are several ways of minimising such
problems:
 1. By inference- certain questions in a
research instrument may be related to one
another and it might be possible to find out
the answers to one question from the answer
to another.
 Be careful not to introduce new errors into
data
 2.By recall- if the data is collected by means
of interviews, sometimes it might be possible
for the interviewer to recall a respondent’s
answers
 Again, you must be extremely careful

 3. By going back to the respondent- if the
data has been collected by means of
interviews or the questionnaire contain some
identifying information, it is possible to visit
or phone a respondent to confirm or
ascertain an answer.
 This is of course expensive and time

consuming
Ways of Editing Data
 There are two ways of editing data:
 A. Examining answers to one question or

variable at a time.
 B. Examine answers to all questions at the

same time, that is examine the responses


given by a respondent.
Coding Data
 Having cleaned the data, the next step is to
code it.
 The method of coding is largely by two

considerations:
 The way a variable has been measured in

your research instrument


 The way you want to communicate the

findings about a variable to your readers.


 For coding the first level of distinction is
whether a set of data is qualitative or
quantitative in nature
 For qualitative data a further distinction is

whether the information is descriptive in


nature or is generated through discrete
qualitative categories.
 The way you proceed with the coding
depends upon the measurement scale used
in the measurement of a variable and
whether a question is open ended or closed
ended.
 The type statistical procedures that can be

applied to a set of information depend upon


the measurement scale which a variable was
measured in the research instrument. Eg
mean, mode etc
 The process of converting information into
numerical values is called coding.
 Coding of data involves four steps:
 1. Developing a code book
 2. Pre-testing the book
 3. Coding the data
 4. Verifying the code data
Developing a Code book
 A code book provides a set of rules for
assessing numerical values to answers
obtained from responses.
 To develop a code book to prepare data for

computer analysis, it is important to know a


little about the working computers and the
programmes being used.
Examples of questions
 1. Please indicate:
 A) your current age ----------------

 B) Your marital status


 Currently married ---------
 Living in a de facto relationship ----
 Separated -------------
 Divorced -------------
 Never married ---------------
2. Specify the level of education
 a Level of education Area of study eg
accounting

Diploma

Bachelors degree

Masters degree

PhD

Secondary school
CODE BOOK
 1. a. Code
 Age 20 -24 1
 25-29 2
 30- 34 3
 35- 39 4
 40- 44 5
 45-45 6
 No Response 9
 1.b. Code
 Marital status
 Currently married 1
 Living in a de facto relationship 2
 Separated 3
 Divorced 4
 Never married 5
 No Response 9
 2. Education level Code
 Diploma 1
 Bachelors Degree 2
 Masters 3
 PhD 4
 Secondary school 5
 No Response 9
 Area of study. Code
 Accounting 1
 Business studies 2
 Commerce 3
 Economics 4
 History 5
 No Response 15
Pre-testing the code book
 Once a code is designed, it is important to
pre-test it if any problems before you code
your data
 A pre-test involves selecting a few
questionnaires/interview schedules and
actually coding the responses to ascertain
any problems in coding.
 It is possible that you may not have
provided for some responses and therefore
will be unable to code them
 Change your code book, if you need to, in
the light of the pre-test
Coding the data
 Once your code book is finalised, the next
step is to code the raw data.
 There are two ways of doing this:
 Coding on the questionnaire/interview
schedule itself, if space for coding was
provided at the time of constructing the
research instrument
 Coding on separate code sheets that are
available for
Developing a frame of analysis
 A frame of analysis should specify:
 Which variables you are planning to analyse
 How they should be analysed
 What cross tabulation you need to work out
 Which variable you need to combine to

construct your major concepts or to develop


indices
 Which variables are to be subjected to which

statistical procedure
Frequency distribution
 Frequency distribution group respondents
into the sub-categories into which variables
have been divided.
 Frequency distribution is for the following

variables.
 Age
 Marital status
 Education etc
Cross Tabulation
 Cross tabulation analyse two variables usually
independent and dependent or attribute to
determine if there is a relationship.
 The sub-categories of both the variables are

cross-tabulated to ascertain if a relationship


exist between them.
Analysing data
 Coded data can be analysed manually or with
the help of a computer.
 However, manual analysis is only useful for

calculating frequencies and for simple cross


tabulations.
 In the current days data can be coded on

computers statistical packages


Computer packages
 Excell
 SPSS
 Microfit
 Limdep
 GAMS
 RATS
 Eviews etc
The role of statistics
 Statistics have a role only when you have
collected the required information adhering
to the requirements of each operational step
of the research process

You might also like