Professional Documents
Culture Documents
Survey Quality
Chapter 7
Data Processing: Errors
and Their Control
Topics
1. Overview of Data Processing Steps
2. Nature of Data Processing Error
3. Data Capture Errors
4. Post–Data Capture Editing
5. Coding
6. File Preparation
7. Applications of Continuous Quality Improvement:
The Case of Coding
8. Integration Activities
Table 2. Five Major Sources of Nonsampling Error and Their Potential Causes
Data processing is a set of activities aimed at
converting the survey data from its raw state
as the output from data collection to a
cleaned and corrected state that be can used
in analysis, presentation, and dissemination.
During data process, the data may be changed
by a number of operations which are intended
to improve their accuracy.
The data may be checked, compared, corrected, keyed or
scanned, coded, tabulated, and so on, until the survey
manager is satisfied that the results are “fit for use.”
The sequence of data processing steps range
from the simple (e.g., data keying) to the
complex, involving:
➢ editing,
➢ imputation,
➢ weighting,
➢ and so on.
Data processing operations can be expensive,
time consuming, and costly.
Technology has also allowed greater integration of
data processing with other survey processes.
Some data processing steps can be accomplished during
the data collection phase, thereby reducing costs and
total production time while improving data accuracy.
Data processing operations may be quite
prone to human error when performed
manually.
By reducing reliance on manual labor, automation
reduces the types of errors in the data caused by
manual processing, but it may also introduce
other types of errors that are specific to the
technology used.
The literature on the data processing error
and control is quite small relative to that on
the measurement error (especially,
respondent errors and questionnaire effects)
and nonresponse.
Data processing operations have traditionally accounted for a very large
portion of the total survey budget. In some surveys the editing alone
consumes up to 40% of the entire survey budget (U.S. Federal Committee on Statistical
Methodology, 1990).
This is unfortunate since some processing
steps, such as coding, can be very error-prone,
particularly coding of complex concepts.