You are on page 1of 6

Chapter 1

The Problem and Its Background


1.1 Introduction

Missing data in sample surveys is inevitable for survey statisticians. The problem

of missing data occurs for various reasons such as when the respondent moved to another

location, is temporarily unavailable, refuses to participate in the survey or is unable to

answer specific items in the survey. This failure to complete the desired responses by the

units selected in the sample is called nonresponse. There are several types of

nonresponse; (a) Unit nonresponse refers to the failure to collect any data from a sample

unit; (b) item nonresponse refers to the failure to collect valid responses to one or more

items from a responding sample unit; (c) partial nonresponse occurs when there is a

failure to collect responses for large sets of items for a responding unit.

In surveys where there are more than one round of data collection, the problem of

nonresponse becomes more complicated. In most surveys of this type, it is likely possible

that a unit would respond to the first round of the survey, however, fails to respond in the

succeeding rounds. This failure to respond in the succeeding rounds is also known as

partial nonresponse.

The effect of nonresponse must not be ignored since it leads to biased estimates which if

large would result to inaccuracy. Bias due to nonresponse is believed to be a function of

nonresponse rates and the difference between respondents and nonresponding units. The
larger the nonresponse rate or the wider the difference between the responding and

nonresponding, the result will lead to a larger bias.

In practice, there are three ways of handling missing data. Researchers could discard the

missing values, apply weighting adjustments or use imputation techniques. Weighting

adjustments is based on matching nonrespondents to respondents in terms of data

available on nonrespondents and increasing the weights of matched respondents to

account for the missing values. Hence, a weight proportionate to the amount of

nonresponse is assigned to a responding unit. This is often applied for unit nonresponse.

On the other hand, imputation is also used by statisticians to account for nonresponse,

usually in the case of item and partial nonresponse. In imputation, a missing value is

replaced by a reasonable substitute for the missing information. Once nonresponse has

been dealt with, whether by weighting adjustments or imputation, then researchers can

proceed with their data analysis.

The Family Income and Expenditure Survey (FIES) is an example of a survey which has

more than one round of data collection. FIES is a nationwide survey of households

conducted every three years with two visits per survey period by the National Statistics

Office (NSO) in order to provide information of the country’s income distribution,

spending patterns and poverty incidence. Like any other survey, FIES encounters the

problem of missing data, particularly the problem of nonresponse during the second visit.

Given the various contributions that this survey can provide, it is then important to have

precise estimates of these indicators.


With the FIES 1997 as the data set for this study, this paper will focus on

dealing with partial nonresponse through the use of imputation techniques. Specifically,

applying the imputation techniques in the study about the 1978 Research Panel Survey

for the Income Survey Development Program (ISDP) entitled “Compensating for

Missing Data” by Kalton, the purpose of this paper is to examine the effects of imputed

values in coming up with estimates for the missing data at various nonresponse rates and

to determine which imputation technique is appropriate for the FIES data.

1.2 Statement of the Problem

This paper will attempt to follow the questions:

1. Will the imputation methods generate less biased estimates than the estimates

generated by ignoring nonresponses in this study?

2. Which of the imputation method/s do the researchers yielded the best and

worst estimates for each nonresponses rate?

3. How do varying nonresponse rates affect the results for each imputation meth-

od?

1.3 Significance of the Study

Nonresponse is a common problem in conducting surveys. The presence of nonre-

sponse in surveys causes to create incomplete data, which could pose serious problems

during data analysis. For this reason, first, the use of imputation techniques enables to

compensate for the missing data by substituting reasonable value rather than deleting or
ignoring the observations. This then helps reduce the nonresponse bias in the survey es-

timates.

Secondly, since most statistical packages require the use of complete data before conduct-

ing any procedure for data analysis, the use of imputation techniques can ensure consist-

ency of results across analyses, something that an incomplete data set cannot fully

provide. In addition, unlike in weighting adjustments where complex procedures are

made to generate estimates, hence becoming more time consuming as compared to estim-

ation.

Third, most countries in the developing world such as the United States, Canada, UK and

the Netherlands already employ imputation techniques in their respective national statist-

ical offices. In a country such as the Philippines, where data collection is very difficult es-

pecially for some regions like the National Capital Region (NCR), imputation will be

able to ease the problem of data collection and nonresponse. This can even make us at par

with our counterparts in the developing world in terms of statistical research.

Lastly, in the case of FIES, whose primary objective is to provide information about the

country’s income distribution, spending patters and poverty incidence, it is then important

to ensure that the precision of the estimates in the survey. Given the great impact of this

survey to the country, employing imputation techniques helps statisticians to provide a

method in handling nonresponse, which could lead to a more meaningful generalization

about our country’s income distribution and spending patterns. Hence, having a data set
with less bias and more consistent results, this can contribute in making our policymakers

and economists provide better solutions in improving the lives of the Filipinos.

1.4 Scope and Limitations

Throughout this paper, only the Family Income and Expenditure Survey (FIES) 1997 will

be used to tackle the problem of nonresponse and to examine the impact of the different

imputation methods applied in the dataset. The paper will only cover the partial nonre-

sponse occurring in the National Capital Region (NCR) since NCR is noted as the region

with the highest nonresponse rate. Also, the variables that will be imputed for this study

would be the Total Income (TOTIN2) and Total Expenditure (TOTEX2) in the second

visit of the FIES data.

The researchers will only focus on using the FIES 1997 data on the first visit to impute

the partial nonresponse that is present on the second visit. This paper also assumes that

the first visit data is complete and an example of a data Missing Completely At Random

(MCAR) case. The MCAR case happens if the probability of missing data on Y is unre-

lated to the value of Y after controlling for other variables in the analysis. If the data

failed to satisfy the MCAR case, the imputation techniques will not work on this prob-

lem. More importantly, this paper will not tackle the procedure for solving total nonre-

sponse and non coverage as well as the procedures to address these problems such as

population weighting adjustments and ranking ratio adjustments.


As for the imputation techniques, only four imputation methods will be applied for this

paper namely: Overall Mean Imputation (OMI), Hot Deck Imputation (HD), Determinist-

ic Regression Imputation (DR) and Stochastic Regression Imputation (SR). All other im-

putation methods will not be used on this paper.

On the aspect of evaluating the efficiency and appropriateness of the four imputation

methods, this will only be limited to the following: (a) Bias of the imputed data; (b) as-

sessment of the distribution of the imputed vs. actual data; and (c) the criteria set in the

report entitled Compensating for Missing Data (Kalton, 1983) namely the mean devi-

ation, mean absolute deviation and root mean square deviation.

You might also like