Chapter 1

Chapter 1
The Problem and Its Background

1.1 Introduction
Missing data in sample surveys is inevitable for survey statisticians. The problem
of missing data occurs for various reasons such as when the respondent moved to another
location, is temporarily unavailable, refuses to participate in the survey or is unable to
answer specific items in the survey. This failure to complete the desired responses by the
units selected in the sample is called nonresponse. There are several types of
nonresponse; (a) Unit nonresponse refers to the failure to collect any data from a sample
unit; (b) item nonresponse refers to the failure to collect valid responses to one or more
items from a responding sample unit; (c) partial nonresponse occurs when there is a
failure to collect responses for large sets of items for a responding unit.
In surveys where there are more than one round of data collection, the problem of
nonresponse becomes more complicated. In most surveys of this type, it is likely possible
that a unit would respond to the first round of the survey, however, fails to respond in the
succeeding rounds. This failure to respond in the succeeding rounds is also known as
partial nonresponse.
The effect of nonresponse must not be ignored since it leads to biased estimates which if
large would result to inaccuracy. Bias due to nonresponse is believed to be a function of
nonresponse rates and the difference between respondents and nonresponding units. The
larger the nonresponse rate or the wider the difference between the responding and
nonresponding, the result will lead to a larger bias.
In practice, there are three ways of handling missing data. Researchers could discard the
missing values, apply weighting adjustments or use imputation techniques. Weighting
adjustments is based on matching nonrespondents to respondents in terms of data
available on nonrespondents and increasing the weights of matched respondents to
account for the missing values. Hence, a weight proportionate to the amount of
nonresponse is assigned to a responding unit. This is often applied for unit nonresponse.
On the other hand, imputation is also used by statisticians to account for nonresponse,
usually in the case of item and partial nonresponse. In imputation, a missing value is
replaced by a reasonable substitute for the missing information. Once nonresponse has
been dealt with, whether by weighting adjustments or imputation, then researchers can
proceed with their data analysis.
The Family Income and Expenditure Survey (FIES) is an example of a survey which has
more than one round of data collection. FIES is a nationwide survey of households
conducted every three years with two visits per survey period by the National Statistics
Office (NSO) in order to provide information of the country’s income distribution,
spending patterns and poverty incidence. Like any other survey, FIES encounters the
problem of missing data, particularly the problem of nonresponse during the second visit.
Given the various contributions that this survey can provide, it is then important to have
precise estimates of these indicators.

With the FIES 1997 as the data set for this study, this paper will focus on
dealing with partial nonresponse through the use of imputation techniques. Specifically,
applying the imputation techniques in the study about the 1978 Research Panel Survey
for the Income Survey Development Program (ISDP) entitled “Compensating for
Missing Data” by Kalton, the purpose of this paper is to examine the effects of imputed
values in coming up with estimates for the missing data at various nonresponse rates and
to determine which imputation technique is appropriate for the FIES data.
1.2 Statement of the Problem
This paper will attempt to follow the questions:
1. Will the imputation methods generate less biased estimates than the estimates
generated by ignoring nonresponses in this study?
2. Which of the imputation method/s do the researchers yielded the best and
worst estimates for each nonresponses rate?
3. How do varying nonresponse rates affect the results for each imputation meth-
od?
1.3 Significance of the Study
Nonresponse is a common problem in conducting surveys. The presence of nonre-
sponse in surveys causes to create incomplete data, which could pose serious problems
during data analysis. For this reason, first, the use of imputation techniques enables to
compensate for the missing data by substituting reasonable value rather than deleting or
ignoring the observations. This then helps reduce the nonresponse bias in the survey es-
timates.
Secondly, since most statistical packages require the use of complete data before conduct-
ing any procedure for data analysis, the use of imputation techniques can ensure consist-
ency of results across analyses, something that an incomplete data set cannot fully
provide. In addition, unlike in weighting adjustments where complex procedures are
made to generate estimates, hence becoming more time consuming as compared to estim-
ation.
Third, most countries in the developing world such as the United States, Canada, UK and
the Netherlands already employ imputation techniques in their respective national statist-
ical offices. In a country such as the Philippines, where data collection is very difficult es-
pecially for some regions like the National Capital Region (NCR), imputation will be
able to ease the problem of data collection and nonresponse. This can even make us at par
with our counterparts in the developing world in terms of statistical research.
Lastly, in the case of FIES, whose primary objective is to provide information about the
country’s income distribution, spending patters and poverty incidence, it is then important
to ensure that the precision of the estimates in the survey. Given the great impact of this
survey to the country, employing imputation techniques helps statisticians to provide a
method in handling nonresponse, which could lead to a more meaningful generalization
about our country’s income distribution and spending patterns. Hence, having a data set
with less bias and more consistent results, this can contribute in making our policymakers
and economists provide better solutions in improving the lives of the Filipinos.
1.4 Scope and Limitations
Throughout this paper, only the Family Income and Expenditure Survey (FIES) 1997 will
be used to tackle the problem of nonresponse and to examine the impact of the different
imputation methods applied in the dataset. The paper will only cover the partial nonre-
sponse occurring in the National Capital Region (NCR) since NCR is noted as the region
with the highest nonresponse rate. Also, the variables that will be imputed for this study
would be the Total Income (TOTIN2) and Total Expenditure (TOTEX2) in the second
visit of the FIES data.
The researchers will only focus on using the FIES 1997 data on the first visit to impute
the partial nonresponse that is present on the second visit. This paper also assumes that
the first visit data is complete and an example of a data Missing Completely At Random
(MCAR) case. The MCAR case happens if the probability of missing data on Y is unre-
lated to the value of Y after controlling for other variables in the analysis. If the data
failed to satisfy the MCAR case, the imputation techniques will not work on this prob-
lem. More importantly, this paper will not tackle the procedure for solving total nonre-
sponse and non coverage as well as the procedures to address these problems such as
population weighting adjustments and ranking ratio adjustments.

As for the imputation techniques, only four imputation methods will be applied for this
paper namely: Overall Mean Imputation (OMI), Hot Deck Imputation (HD), Determinist-
ic Regression Imputation (DR) and Stochastic Regression Imputation (SR). All other im-
putation methods will not be used on this paper.
On the aspect of evaluating the efficiency and appropriateness of the four imputation
methods, this will only be limited to the following: (a) Bias of the imputed data; (b) as-
sessment of the distribution of the imputed vs. actual data; and (c) the criteria set in the
report entitled Compensating for Missing Data (Kalton, 1983) namely the mean devi-
ation, mean absolute deviation and root mean square deviation.

Chapter 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1

Uploaded by

Copyright:

Available Formats

Chapter 1

The Problem and Its Background

location, is temporarily unavailable, refuses to participate in the survey or is unable to

large would result to inaccuracy. Bias due to nonresponse is believed to be a function of

nonresponding, the result will lead to a larger bias.

missing values, apply weighting adjustments or use imputation techniques. Weighting

adjustments is based on matching nonrespondents to respondents in terms of data

available on nonrespondents and increasing the weights of matched respondents to

proceed with their data analysis.

Office (NSO) in order to provide information of the country’s income distribution,

precise estimates of these indicators.

to determine which imputation technique is appropriate for the FIES data.

1.2 Statement of the Problem

This paper will attempt to follow the questions:

generated by ignoring nonresponses in this study?

worst estimates for each nonresponses rate?

1.3 Significance of the Study

Nonresponse is a common problem in conducting surveys. The presence of nonre-

provide. In addition, unlike in weighting adjustments where complex procedures are

with our counterparts in the developing world in terms of statistical research.

survey to the country, employing imputation techniques helps statisticians to provide a

method in handling nonresponse, which could lead to a more meaningful generalization

1.4 Scope and Limitations

visit of the FIES data.

population weighting adjustments and ranking ratio adjustments.

putation methods will not be used on this paper.

ation, mean absolute deviation and root mean square deviation.

You might also like