Professional Documents
Culture Documents
Missing data in sample surveys is inevitable for survey statisticians. The problem
of missing data occurs for various reasons such as when the respondent moved to another
answer specific items in the survey. This failure to complete the desired responses by the
units selected in the sample is called nonresponse. There are several types of
nonresponse; (a) Unit nonresponse refers to the failure to collect any data from a sample
unit; (b) item nonresponse refers to the failure to collect valid responses to one or more
items from a responding sample unit; (c) partial nonresponse occurs when there is a
failure to collect responses for large sets of items for a responding unit.
In surveys where there are more than one round of data collection, the problem of
nonresponse becomes more complicated. In most surveys of this type, it is likely possible
that a unit would respond to the first round of the survey, however, fails to respond in the
succeeding rounds. This failure to respond in the succeeding rounds is also known as
partial nonresponse.
The effect of nonresponse must not be ignored since it leads to biased estimates which if
nonresponse rates and the difference between respondents and nonresponding units. The
larger the nonresponse rate or the wider the difference between the responding and
In practice, there are three ways of handling missing data. Researchers could discard the
account for the missing values. Hence, a weight proportionate to the amount of
nonresponse is assigned to a responding unit. This is often applied for unit nonresponse.
On the other hand, imputation is also used by statisticians to account for nonresponse,
usually in the case of item and partial nonresponse. In imputation, a missing value is
replaced by a reasonable substitute for the missing information. Once nonresponse has
been dealt with, whether by weighting adjustments or imputation, then researchers can
The Family Income and Expenditure Survey (FIES) is an example of a survey which has
more than one round of data collection. FIES is a nationwide survey of households
conducted every three years with two visits per survey period by the National Statistics
spending patterns and poverty incidence. Like any other survey, FIES encounters the
problem of missing data, particularly the problem of nonresponse during the second visit.
Given the various contributions that this survey can provide, it is then important to have
dealing with partial nonresponse through the use of imputation techniques. Specifically,
applying the imputation techniques in the study about the 1978 Research Panel Survey
for the Income Survey Development Program (ISDP) entitled “Compensating for
Missing Data” by Kalton, the purpose of this paper is to examine the effects of imputed
values in coming up with estimates for the missing data at various nonresponse rates and
1. Will the imputation methods generate less biased estimates than the estimates
2. Which of the imputation method/s do the researchers yielded the best and
3. How do varying nonresponse rates affect the results for each imputation meth-
od?
sponse in surveys causes to create incomplete data, which could pose serious problems
during data analysis. For this reason, first, the use of imputation techniques enables to
compensate for the missing data by substituting reasonable value rather than deleting or
ignoring the observations. This then helps reduce the nonresponse bias in the survey es-
timates.
Secondly, since most statistical packages require the use of complete data before conduct-
ing any procedure for data analysis, the use of imputation techniques can ensure consist-
ency of results across analyses, something that an incomplete data set cannot fully
made to generate estimates, hence becoming more time consuming as compared to estim-
ation.
Third, most countries in the developing world such as the United States, Canada, UK and
the Netherlands already employ imputation techniques in their respective national statist-
ical offices. In a country such as the Philippines, where data collection is very difficult es-
pecially for some regions like the National Capital Region (NCR), imputation will be
able to ease the problem of data collection and nonresponse. This can even make us at par
Lastly, in the case of FIES, whose primary objective is to provide information about the
country’s income distribution, spending patters and poverty incidence, it is then important
to ensure that the precision of the estimates in the survey. Given the great impact of this
about our country’s income distribution and spending patterns. Hence, having a data set
with less bias and more consistent results, this can contribute in making our policymakers
and economists provide better solutions in improving the lives of the Filipinos.
Throughout this paper, only the Family Income and Expenditure Survey (FIES) 1997 will
be used to tackle the problem of nonresponse and to examine the impact of the different
imputation methods applied in the dataset. The paper will only cover the partial nonre-
sponse occurring in the National Capital Region (NCR) since NCR is noted as the region
with the highest nonresponse rate. Also, the variables that will be imputed for this study
would be the Total Income (TOTIN2) and Total Expenditure (TOTEX2) in the second
The researchers will only focus on using the FIES 1997 data on the first visit to impute
the partial nonresponse that is present on the second visit. This paper also assumes that
the first visit data is complete and an example of a data Missing Completely At Random
(MCAR) case. The MCAR case happens if the probability of missing data on Y is unre-
lated to the value of Y after controlling for other variables in the analysis. If the data
failed to satisfy the MCAR case, the imputation techniques will not work on this prob-
lem. More importantly, this paper will not tackle the procedure for solving total nonre-
sponse and non coverage as well as the procedures to address these problems such as
paper namely: Overall Mean Imputation (OMI), Hot Deck Imputation (HD), Determinist-
ic Regression Imputation (DR) and Stochastic Regression Imputation (SR). All other im-
On the aspect of evaluating the efficiency and appropriateness of the four imputation
methods, this will only be limited to the following: (a) Bias of the imputed data; (b) as-
sessment of the distribution of the imputed vs. actual data; and (c) the criteria set in the
report entitled Compensating for Missing Data (Kalton, 1983) namely the mean devi-