Conceptual Framework

Conceptual Framework

Nonresponse Bias [ needs paraphrasing and proper citation!]

In most surveys, post-analysis results are invalid due to missing data. Missing data can either be thrown away, ignored or substituted through some procedure. When data are thrown away or ignored in generating estimates, nonresponse bias becomes a problem. This section examines the nonresponse bias in processing the data excluding the nonresponse observations. (Kalton, 1983)

For simplicity, a simple random sample (SRS) in the variable y, where y contains missing data, from a population of size N is drawn. The need to define the types and patterns of nonresponse, which will be discussed later, are unimportant for this section. The data will further be assumed to be divided into two groups, the set of nonrespondents and respondents. In reality, the division of the data into two simple groups is an oversimplification for some units at least, chance plays whether they respond or not. The simplified model is appealing, however, its tractability leads to some informative results. (Kalton, 1983)

Let R be the number of respondents and M be the number of nonrespondents (M for missing) in the population, with N = R + N; the corresponding sample quantities are r and m, with r + m = n. Let\ue000R = R/N and

M= M/N be the proportions of respondents and nonrespondents in the population
and let\ue000r = r/n and\ue000
m= m/n be the response and nonresponse rates in the sample.
The population total and mean are given byY=Yr\ue001Ym=R\ue000
Ym, whereYr and\ue000
Yrare the total and mean for respondents and
Ymare the same quantities for the nonrespondents. The corresponding
sample quantities arey= yr\ue001 ym= r\ue000
ymand\ue000y= \ue000r \ue000
ym. (Kalton, 1983)
If no compensation is made for nonresponse, the respondent sample mean
yris used to estimate\ue000
Y. Its bias is given byB\ue001\ue000
Y. The expectation
yrcan be obtained in two stages, first conditional on fixed r and then over
different values of r, i.e.E[\ue000y]= E1 E2\ue001\ue000
yr\ue002where E2 is the conditional expectation
for fixed r and E1 is expectation over different values of r. Thus,
yr\ue002=E1[\u2211 E2\ue001 yri\ue002/r]=E1\ue001\ue000
Hence the bias of\ue000
yris given by
The equation shows that\ue000
yrapproximately unbiased for\ue000
Yif either the
proportion of nonrespondents\ue000
Mis small or the mean of the nonrespondents,\ue000
, is close to that for respondents,\ue000
Yr. Since the survey analyst usually has no
direct empirical evidence on the magnitude of\ue001\ue000
Ym\ue002, the only situtation in
which he can have confidence that the bias is small is when the nonresponse rate
is low. However, in practice, even with moderate\ue000
Mmany survey results escape
sizable biases because\ue001\ue000
Ym\ue002is fortunately often not large. (Kalton, 1983)

In reducing nonresponse bias caused by missing data, there are many procedures that can be applied and one of this is imputation. In this study, imputation procedures are applied to eliminate nonresponse and reduce bias to the estimates. Imputation is briefly defined as the substitution of values for the nonresponse observations. The discussion of imputation procedures will be provided later.

Nonresponse and Its Patterns

In surveys, nonresponse observations follow a definite pattern. For this study, missing data and nonresponse can be used interchangeably. There are three patterns a nonresponse data can have. It can be that the missing data for a variable Y are \u201cMissing Completely at Random\u201d (MCAR) if the probability of having a missing value for Y is unrelated to the value of Y itself or to any other variable in the data set. Data that are MCAR reflect the highest degree of randomness and show no underlying reasons for missing observations that can potentially lead to bias research findings. With MCAR, the occurrence of missing data is unrelated to the other variables in the data set or other systematic factors; missing data are randomly distributed across all cases.

Another pattern of a nonresponse data is the Missing At Random (MAR). The missing data for a variable Y is considered MAR if the probability of missing data on Y is unrelated to the value of Y after controlling for other variables in the analysis. MAR data show some randomness to the pattern of data omission. The likelihood of a case having incomplete information on a variable can be explained by other variables in the data set, although presence or absence of missing values on a variable is not related to the participants\u2019 true status on the missing variable.

The difference of MCAR and MAR is the relationship of the variable Y to the other variables in the data set. Nonresponse in MCAR is completely independent to the other variables. There is no relationship of the missing values in Y variable to the responding values and the other variables in the data set. In MAR, there is a relationship between the missing observations in Y and with the other variables. The variables could explain the incomplete information from the Y variable. [parang umuulit ang part na to!]

The last pattern of nonresponse and considerably the worst of the three is the probability of missing data on Y is related to the value of Y even if other variables are controlled in the analysis. Such case is termed as NonIgnorable Nonresponse (NIN). NIN missing data have systematic, nonrandom factors underlying the occurrence of the missing values that are not apparent or otherwise measured. NIN missing data are the most problematic because of the effect in the

generalizability of research findings and may potentially create bias parameter estimates, such as the means, standard deviations, correlation coefficients or regression coefficients.

These patterns are considered as an important assumption in imputation. For an imputation procedure to work and achieve statistically acceptable estimates, the pattern of nonresponses must either satisfy the MCAR or MAR assumption. For this study, the researchers\u2019 created nonresponse that follows the MCAR assumption.

Nonresponse and Its Types [needs editing ]

Another important assumption in imputation is the types of nonresponse. While the patterns of nonresponse focus on the relationships of the nonresponse variable to other variables, the types of nonresponse focus on the method in which the observations are nonresponse values. Kalton (1983) stressed the importance to differentiate the types of nonresponse: noncoverage, total (unit) nonresponse, item nonresponse, partial nonresponse.

Noncoverage (NC) denotes the failure to include some units of the survey population in the sampling frame. As a consequence, units that are excluded in the frame have no chance of appearing in the sample. NC is not usually a type of nonresponse; however, Kalton (1983) loosely classifies this for convenience purposes. NC can be seen in surveys where units are failed to cover in the sampling frame or the listing of units are incomplete.

Unit (or total) nonresponse (UN) takes place wherein no information collected from a sampling unit. There are many causes of this nonresponse, namely, the failure to contact the respondent (not at home, moved or unit not being found), refusal to collect information, inability of the unit to cooperate (might be due to an illness or a language barrier) or questionnaires that are lost.

Item nonresponse (IN) emerges when the information collected from a unit is incomplete due to the refusal of answering some of the questions. There many causes of this nonresponse, namely, refusal to answer the question due to the lack of information necessarily needed by the informant, failure to make the effort required to establish the information by retrieving it from his memory or by consulting his records, refuses to give answers because the questions might be sensitive, embarrassing or considers to his perception of the survey\u2019s objectives, the interviewer fails to record an answer (might skipped questions), or because the response is subsequently rejected at an edit check on the grounds that it is inconsistent with other responses (may include an inconsistency arising from a coding or punching error occurring in the transfer of the response of the computer data file).

