You are on page 1of 63

PhD In Nursing

RESEARCH METHODS

Module No: 7 DATA PREPARATION

Name of the subtopic


7.1 Data Preparation, editing, Coding

Faculty Name;
Date:
Subject Code: School of Nursing
Learning Objectives
At the end of this module the students will be able to learn the
1. Meaning of data preparation
2. explain the concept, types and purpose of editing
3. list the issues in coding and after coding
4. understand data classification
5. Describe the tabulation and usefulness

2
List of Contents
 Meaning of Data preparation
 Editing
 Types of editing
 Coding
 Issues in coding and after coding
 Data classification
 Types of classification
 Tabulation

3
Introduction

After collecting data, the method of converting raw data into


meaningful statement; includes data processing, data analysis,
and data interpretation and presentation.

Data reduction or processing mainly involves various


manipulations necessary for preparing the data for analysis. The
process (of manipulation) could be manual or electronic. It
involves editing, categorizing the open-ended questions, coding,
computerization and preparation of tables and diagrams.
Data Preparation: Introduction

 Once the data begin to flow, a researcher’s attention turns to


data analysis.

 Data preparation includes editing, coding, and data entry;

 It is the activity that ensures the accuracy of the data and


their conversion from raw form to reduced and classified
forms that are more appropriate for analysis.
Data Preparation: Introduction

 Preparing a descriptive statistical summary is another


preliminary step leading to an understanding of the
collected data;

 It is during this step that data entry errors may be


revealed and corrected.
Preparation of data for analysis

 Data preparation is very important step in research process as all


the information's collected out of any research need to undergo
certain process to ensure the data is ready for analysis.
 Unprepared data will lead to spurious results.
 Data Preparation involves checking or logging the data in;
checking the data for accuracy; entering the data into the
computer; transforming the data; and developing and
documenting a database structure that integrates the various
measures.
Preparation of data - basics

 Data is the collection of different pieces of information or


facts. These pieces of information are called variables in
research.
 A variable is an identifiable piece of data containing one or
more values.
 Those values can take the form of a number or text (which
could be converted into number).
Data preparation - Purposes

 The purpose is to guarantee that data are:


 Accurate;
 Consistent with the intent of the question and their
information in the survey;
 Uniformly entered;
 Complete; and
 Arranged to simplify coding and tabulation.
Data Preparation Process
Prepare Preliminary Plan of Data
Analysis
Check Questionnaire

Edit

Code

Transcribe

Clean Data

Statistically Adjust the Data

Select Data Analysis Strategy


Preliminary plan of the Stages of Data
Analysis
Questionnaire Checking
A questionnaire returned from the field may be unacceptable for
several reasons.
 Parts of the questionnaire may be incomplete.
 The pattern of responses may indicate that the respondent did
not understand or follow the instructions.
 The responses show little variance.
 One or more pages are missing.
 The questionnaire is received after the preestablished cutoff
date.
 The questionnaire is answered by someone who does not
qualify for participation.
Treatment of Unsatisfactory Responses in questionnaire

Treatment of
Unsatisfactory
Responses

Return to the Assign Missing Discard


Unsatisfactory
Field Values Respondents

Substitute a Casewise Pairwise


Neutral Value Deletion Deletion
Data Processing Operations-data editing
 
Data Editing:
Data Editing; The first step in analysis is to edit the raw data.
Editing detects errors and omissions, corrects them when
possible, and certifies that maximum data quality standards are
achieved.
 
Data Editing:
Data Editing; The editor’s purpose is to guarantee that data is:
Accurate; Consistent with the intent of the question and their
information in the survey; Uniformly entered; Complete; and
Arranged to simplify coding and tabulation. 5
 

14
Data Processing Operations-data editing-contd
 
 Types of Editing:
Two types of editing are :
FIELD EDITING
CENTRAL EDITING.

Field Editing : In large projects, field editing review is the


responsibility of the field supervisor;
Central Editing : For a small study, the use of a single editor
produces maximum consistency. In large studies, editing tasks
should be allocated so that each editor deals with one entire
section. .
 

15
Data Processing Operations-data
editing-contd

Data editing is the “The inspection and correction of the


data received from each element of the sample.”

The process of checking and adjusting responses in the


completed questionnaires for omissions, legibility, and
consistency and readying them for coding and storage
Data Preparation: Editing

 The customary first step in analysis is to edit the raw data.

 Editing detects errors and omissions, corrects them when


possible, and certifies that maximum data quality standards
are achieved.
Data Preparation: Editing-contd
Types of Editing
1. Field Editing
Preliminary editing by a field supervisor on the same day as the
interview to catch technical omissions, check legibility of
handwriting, and clarify responses that are logically or conceptually
inconsistent.
 When entry gaps are present from interviews, a callback should be
made rather than guessing what the respondent “probably would
have said”.
 Self-interviewing has no place in quality research.
 Validating the field research is the control function of the
supervisor.
 It means he or she will re interview some percentage of the
respondents to make sure they have participated.
 Many research firms will re contact about 10 per cent of the
respondents in this process of data validation.
Data Preparation: Editing-contd

Types of Editing-contd
2. Central Editing
For a small study, the use of a single editor produces
maximum consistency. In large studies, editing tasks should be
allocated so that each editor deals with one entire section.

 When replies are inappropriate or missing, the editor can


sometimes detect the proper answer by reviewing the
other information in the data set.
 It may be better to contact the respondent for correct
information, if time and budget allow.
 Another alternative is for the editor to strike out the
answer if it is inappropriate. Here an editing entry of
“no answer” is called for.
Data Preparation: Editing-contd
Types of Editing-contd
2. Central Editing-contd
 Another problem that editing can detect concerns faking
an interview that never took place.
 This “armchair interviewing” is difficult to spot, but the
editor is in the best position to do so.
 One approach is to check responses to open-ended
questions. These are most difficult to fake. Distinctive
response patterns in other questions will often emerge if
data falsification is occurring.

To uncover this, the editor must analyze the set of


instruments used by each interviewer.
Data Preparation: Editing-contd

Purpose of Editing
1. For consistency between and among responses
2. For completeness in responses– to reduce effects of
item non-response
3. To better utilize questions answered out of order
4. To facilitate the coding process
Data Preparation: Editing-contd

Editing for Completeness


 Item Nonresponse
 The technical term for an unanswered question on an
otherwise complete questionnaire resulting in missing
data.
 Plug Value
 An answer that an editor “plugs in” to replace blanks or
missing values so as to permit data analysis; choice of
value is based on a predetermined decision rule.
 Impute
 To fill in a missing data point through the use of a
statistical algorithm that provides a best guess for the
missing response based on available information.
Data Preparation: Editing-contd
 Pitfalls of Editing
 Allowing subjectivity to enter into the editing process.
 Data editors should be intelligent, experienced, and
objective.
 Failing to have a systematic procedure for assessing the
questionnaires developed by the research analyst
 An editor should have clearly defined decision rules to
follow.
 Pretesting Edit
 Editing during the pretest stage can prove very valuable for
improving questionnaire format, identifying poor instructions
or inappropriate question wording.
Data Preparation: coding

CODING

 The process of identifying and classifying each answer with a


numerical score or other character symbol

 The numerical score or symbol is called a code, and serves


as a rule for interpreting, classifying, and recording data

  Identifying responses with codes is necessary if data is to


be processed by computer
Data Preparation: Editing-contd

Data Coding:
Data Coding Data Coding means assigning a code to
each possible response of each question. Usually a
code is a number or other symbols so that the
responses can be grouped into a limited number of
categories. In coding, categories are the partitions of
a data set of a given variable.. Both closed and free-
response questions must be coded.

25
Data Preparation: Editing-contd

Facilitating the Coding Process

 Data Clean-up
 Checking written responses for any stray marks

 Editing And Tabulating “Don’t Know” Answers


 Legitimate don’t know (no opinion)
 Reluctant don’t know (refusal to answer)
 Confused don’t know (does not understand)
Coding - Continued

Coded data is often stored electronically in the form of a data


matrix - a rectangular arrangement of the data into rows
(representing cases) and columns (representing variables)
 
The data matrix is organized into fields, records, and files:
 Field: A collection of characters that represents a single type
of data
 Record: A collection of related fields, i.e., fields related to the
same case (or respondent)
 File: A collection of related records, i.e. records related to the
same sample
Coding-contd
Guidelines for coding unstructured questions:
 Category codes should be mutually exclusive and collectively
exhaustive.
 Only a few (10% or less) of the responses should fall into the
“other” category.
 Category codes should be assigned for critical issues even if no
one has mentioned them.
 Data should be coded to retain as much detail as possible.
Coding of data rules

 Coding Conventions
 Assign: an I.D. number for each case

 Use: numeric, not alphabetic codes, for response


categories in general

 Develop: procedures for systematically verifying coding


and data entry
Codebook

A codebook contains coding instructions and the necessary


information about variables in the data set. A codebook
generally contains the following information:
 column number
 record number
 variable number
 variable name
 question number
 instructions for coding
A Codebook Excerpt

Column Variable Variable Question Coding


Number Number Name Number Instructions
1-3 1 Respondent ID 001 to 890 add leading zeros as
necessary
4 2 Record Number 1 (same for all respondents)
5-6 3 Project Code 31 (same for all respondents)
7-8 4 Interview Code As coded on the questionnaire
9-14 5 date Code As coded on the questionnaire
15-20 6 Time Code As coded on the questionnaire
21-22 7 Validation Code As coded on the questionnaire
23-24 Blank Leave these columns blank
25 8 Who shops I Male head =1
Female head =2

Other =3
Punch the number circled
26 9 Familiarity with store 1 IIa Missing values =9
For question II parts a through j
27 10 Familiarity with store 2 IIb Punch the number circled
Not so familiar =1
Very familiar =6
28 11 Familiarity with store 3 IIc Missing Values =9
35 18 Familiarity with store 10 IIj
Key Issues in Coding

1. Pre-Coding Fixed-Alternative Questions (FAQs)


-Writing codes for FAQs on the questionnaire before the
data collection
2. Coding Open-Ended Questions - A 3-stage process:
(a) Perform a test tabulation, (b) Devise a coding scheme,
(c) Code all responses
Two Rules For Code Construction are:
a) Coding categories should be exhaustive
b) Coding categories should be mutually exclusive and
independent
Issues in Coding - Continued

3. Maintaining a Code Book - A book that identifies each


variable in a study, the variable’s description, code name,
and position in the data matrix
4. Production Coding - The physical activity of transferring
the data from the questionnaire or data collection form [to
the computer] after the data has been collected. Sometimes
done through a coding sheet – ruled paper drawn to mimic
the data matrix
5. Combining Editing and Coding
Coding Variables
Capture data in its most continuous form possible.

Age: 35 years - get the actual value


vs.
Check one: _<25
_ 25-35
_ 36-45
_ >45

34
Dichotomous Variables

Do not do this:
1 = Male
2= Female

Do this!
1 = male
0 = female

Why? Add function

35
Dummy Coding

Ethnicity
1 = Black; 2 = White; 3 = Hispanic

N-1 or 3-1 = 2 variables


Black: 1 = Black; 0 = White and Hispanic
White: 1 = White; 0 = Black and Hispanic

36
Missing Data

 SPSS assigns a dot “.” to missing data


 SPSS often gives you a choice of pairwise or listwise
deletion for missing values.

Mean Substitution: give the variable the average score for


the group, e.g. age, adds no variation to the data set.

37
Missing Data

Pairwise: just a particular correlation is removed, best choice to


conserve power

Listwise: removes variables, required in repeated measures


designs.

38
Missing Data:

 Missing Data In survey studies, missing data typically occur


when participants accidentally skip, refuse to answer, or do not
know the answer to an item on the questionnaire.
 In longitudinal studies, missing data may result from
participants dropping out of the study, or being absent for one
or more data collection periods. Missing data also occur due to
researcher error, corrupted data files, and changes in the
research or instrument design after data were collected from
some participants, such as when variables are dropped or
added.

39
Treatment of Missing Data:

Treatment of Missing Data ;


The strategy for handling missing data consists of two-step
process: the researcher first explores the pattern of missing data
to determine the mechanism for missingness (the probability
that a value is missing rather than observed), and then selects a
missing-data technique. The three basic types of techniques
which can be used to salvage data sets with missing values are:
List wise
 Pairwise deletion
 Replacement of missing values with estimated scores

40
AFTER CODING …..

1. Data Entry - The transfer of codes from questionnaires (or


coding sheets) to a computer. Often accomplished in one of
three ways:
a) On-line direct data entry – e.g. as for CATI systems
b) Optical scanning – for highly structured questionnaires
c) Keyboarding – data entry via a computer keyboard; often
requires verification
After Coding - Continued

2. Error Checking

Verifying the accuracy of data entry and checking for


some kinds of obvious errors made during the data entry.
Often accomplished through frequency analysis.
After Coding - Continued

3. Data Transformation – Converting some of the data from


the format in which they were entered to a format most
suitable for particular statistical analysis.
Often accomplished through re-coding, to:

 reverse-score negative (or positive) statements into


positive (or negative) statements;
 collapse the number of categories of a variable
Importance of Coding:

 Importance of Coding:
 Importance of Coding The categorization of data sacrifices
some data detail but is necessary for efficient analysis . Most
software programs work more efficiently in the numeric mode;
Instead of entering the word male or female in response to a
question that asks for the identification of one’s gender, we
would use numeric codes, e.g., 0 for male and 1 for female
Numeric coding simplifies the researcher’s task in converting a
nominal variable, like gender, to a “dummy variable” 8
  

44
Data Transcription

Raw Data

CATI/ Key Punching via Mark Sense Optical Computerized


CAPI CRT Terminal Forms Scanning Sensory
Analysis
Verification: Correct
Key Punching Errors

Computer Magnetic
Disks
Memory Tapes

Transcribed Data
Data classification/distribution
 Sarantakos (1998: 343) defines distribution of data as a
form of classification of scores obtained for the various
categories or a particular variable. There are four types of
distributions:

1. Frequency distribution
2. Percentage distribution
3. Cumulative distribution
4. Statistical distributions

46
 Data Classification:
 Data Classification Data classification is the categorization of
raw data into homogeneous groups having common
characterstics for its most effective and efficient use.
Classification can be of two types:
 Classification according to attributes
 Classification according to class intervals

47
Data classification/distribution
 Frequency distribution:
In social science research, frequency distribution is very common. It
presents the frequency of occurrences of certain categories. This
distribution appears in two forms:
Ungrouped: Here, the scores are not collapsed into categories, e.g.,
distribution of ages of the students of a BJ (MC) class, each age
value (e.g., 18, 19, 20, and so on) will be presented separately in the
distribution.

Grouped: Here, the scores are collapsed into categories, so that 2 or 3


scores are presented together as a group. For example, in the above
age distribution groups like 18-20, 21-22 etc., can be formed)

48
Data classification/distribution
 Percentage distribution:

It is also possible to give frequencies not in absolute numbers


but in percentages. For instance instead of saying 200
respondents of total 2000 had a monthly income of less than
Rs. 500, we can say 10% of the respondents have a monthly
income of less than Rs. 500.

49
Data classification/distribution

 Cumulative distribution:

It tells how often the value of the random variable is less than
or equal to a particular reference value.

50
Data classification/distribution
 Statistical data distribution:

In this type of data distribution, some measure of average is


found out of a sample of respondents. Several kind of averages
are available (mean, median, mode) and the researcher must
decide which is most suitable to his purpose. Once the average
has been calculated, the question arises: how representative a
figure it is, i.e., how closely the answers are bunched around it.
Are most of them very close to it or is there a wide range of
variation?

51
Tabulation of data

After editing, which ensures that the information on the
schedule is accurate and categorized in a suitable form, the
data are put together in some kinds of tables and may also
undergo some other forms of statistical analysis.

Table can be prepared manually and/or by computers. For a


small study of 100 to 200 persons, there may be little point in
tabulating by computer since this necessitates putting the data
on punched cards. But for a survey analysis involving a large
number of respondents and requiring cross tabulation involving
more than two variables, hand tabulation will be inappropriate
and time consuming.

52
Tabulation of Data-contd

 Tabulation of Data The process of placing classified data into


tabular form is known as tabulation. A table is a symmetric
arrangement of statistical data in rows and columns. Rows are
horizontal arrangements whereas columns are vertical
arrangements. is the process of summarizing raw data and
displaying the same in compact form (i.e., in the form of
statistical table) for further analysis 12

53
Tabulation of data-contd
 Usefulness of tables:

Tables are useful to the researchers and the readers in three


ways:

1. The present an overall view of findings in a simpler way.


2. They identify trends.
3. They display relationships in a comparable way between
parts of the findings.
By convention, the dependent variable is presented in the rows
and the independent variable in the columns.

54
Importance of Tabulation:

 Importance of Tabulation When mass data has been assembled,


it becomes necessary for the researcher to arrange the same in
some kind of concise logical order. Tabulation is essential
because: It conserves space and reduces explanatory and
descriptive statement to a minimum It facilitates the process of
comparison It facilitates the summation of items and the
detection of errors and omissions It provides a basis for various
statistical computations.

55
Data Reduction
 Summarization:
 Condensing the raw data into a few meaningful computation.
 Conceptualization:
 Visualization of what of these measures represent.
 Communication:
 Translation of statistical analysis results into a form that is
understandable and, more important, useful to marketing
manager.
 Interpolation:
 Assessment of data to the population
CLEANING THE DATA
 The goal of the data cleaning process is to preserve meaningful
data while removing elements that may impede our ability to run
the analyses
 Two step process including DETECTION and then
CORRECTION of errors in a data set.
 Detection involves,
 Looking at minimum and maximum values (range) for descriptive
statistics .
 Looking for 0's and 999's (or 9999...etc.-- this normally uses the
maximum permissible values) using descriptives or graph/
histograms.
 Looking for LIKELINESS OF A VALUE in terms of range (from
descriptives) or z-score below -4.00
 Looking at MEANS, MEDIANS, and STANDARD DEVIATIONS
CLEANING THE DATA

 CORRECTION of Data Errors and Coping with Errors


 After identification of errors, missing values, and true (extreme or
normal) values, the researcher must decide what to do with
problematic observations. The options are limited to correcting,
deleting, or leaving unchanged.
 When there are a minimal number of errors, the values are
generally recoded to "missing".
 Impossible values are never left unchanged, but should be corrected
if a correct value can be found, otherwise they should be deleted.
Common errors during data preparation

 The errors that can occur during data preparation are usually
linked with the procedures adopted for instrument design,
coding procedures, and the data collection and data entry
methods. For example, the kinds of errors that occur when free
response data are manually coded and transcribed into
computer readable form differ from the kinds of errors that are
likely to occur when data are entered directly into computers
from machine-readable answer sheets
data entry and how to avoid the errors in data
preparation will be discussed in next module.

59
Summary
 In this module we have discussed about data preparation,
coding, data matrix, its types and after coding. The ways to
complete after coding procedure in computing.
The data classification types were also discussed in detail. In the
next module we will know about data entry, validity and
reliability.

60
References
 Carol Leslie Macnee, (2008), Understanding Nursing Research:
Using Research in Evidence-based Practice, Lippincott Williams
& Wilkins, ISBN 0781775582, 9780781775588
 Densise.Polit, et.al, (2013). ‘Nursing research-principles and
methods’, revised edition, Philadelphia, Lippincott
 http://www.vbtutor.net/research/research_chp7.htm#sthash.ZtzD
oA7r.dpuf
 http://adamowen.hubpages.com/hub/Understanding-The-
Different-Types-of-Research-Data
 http://www15.uta.fi/FAST/FIN/RESEARCH/sources.html
 http://www.medicotips.com/2012/01/datatypes-of-data-and-
sources-of-data.html
References
 Open Source Links
www.mbaofficial.com/mba.../explain-data-presentation-
and-processing/
 Audio-Videos

http://www.powershow.com/view1/10cbcb-
ZDc1Z/Data_Preparation_and_powerpoint_ppt_presentatio
n

62
Thanks

Next Topic>>
Data entry, Validity
of data

63

You might also like