You are on page 1of 2

# Draft Only: Subjected to Revision Formation of Imputation Classes Most imputation methods require the formation of imputation classes

before any imputation takes place. In this study there were three imputation methods that were applied by imputation classes, Hot Deck Imputation, Deterministic Regression and Stochastic Regression. Imputation classes play a major role when it comes to biasness and reliability of estimates that were produced by different imputation methods. The main goals of imputation classes are: (1) minimization of the variances within each class and maximize the variance between classes and (2) reduce the biasness of the estimates. Defined earlier in the methodology, imputation classes are stratification classes that divide the data into groups. The division of the data can produce homogeneous groups if proper techniques are applied in creating and selecting imputation classes and its variables. Before formation of imputation classes takes place, there were many considerations to be followed in order to come up with a good set of imputation classes. It is very important to have variables that are highly correlated and minimizes variances within groups. The variables to be used in forming imputation classes are called matching or control variables. The groups of observations coming from the matching variables and the variables to be simulated are called donors and recipients respectively. Imputation classes can be an effective tool in distinguishing the best imputation method among all imputation method whether the classes are introduced or not. Methods with good imputation classes have bigger advantages than methods which does not have imputation classes. However, it still depends on how the imputation classes are selected. In this study, the first visit data set was used in order to come up with imputation classes. The second visit cannot be tested because it is assumed that observations are missing. Evaluation of all potential variables to be used as matching variables must be done in order to achieve the best imputation classes for the three methods. Categorical variables that are economically related to the variables to be simulated were chosen as potential variables. There were three categorical variables that were chosen as potential matching variables: Province (PROV1), Education Status (ES1) and Total Employed Household Members (TOTEM1). It is important to set a definite number of categories for each matching variable to avoid certain dangers. “If the number of imputation groups decreases, heterogeneity within the groups’ increases and the estimates becomes increasingly burdened with aggregation bias. On the other hand, as the number of imputation groups’ decreases, this negatively affects the precision of the estimates, thus inflating their estimates variances.” All potential variables were categorical variables since no categorization will be done unlike for continuous variables that can cause a loss of information. In addition to this, in reality, continuous variables are more prone to nonresponse than categorical variables. Using continuous variables which contains nonresponse observation in forming imputation classes will just increase biasness of the estimates. However, since there were too many categories for ES1 and TOTEM1, another set of categorization were made to reduce the number of categories in those variables. The imputation matching variables employed in these methods were obtained from the tests involving the measures of association for nominal data and tests for independence. The ChiSquare test was applied to know if the potential matching variable is a significant factor to the

Draft Only: Subjected to Revision nonresponse variable. If the potential matching variable were significant, the tests for the measures of association would follow. These were done in order to find the best matching variable that would divide the data into imputation classes with the following characteristics discussed earlier. There were three measures of association that were used in this study, the Phi-coefficient, Cramer's V and Contingency Test. In order to generate faster result, the statistical packages like SPSS and Statistica were used for this part. Using the SPSS cross tabulation function, the results are shown below: Table #.1 Chi-Square Test for Independence
CHI-SQUARE TEST: INCOME VARIABLE VARIABLES STAT P-VALUE PROVINCE 151.78 < 0.0001 CODES1 613.859 < 0.0001 CODEP1 358.436 < 0.0001 DF 9 6 9 CHI-SQUARE TEST: EXPENDITURE VARIABLE VARIABLES STAT P-VALUE PROVINCE 137.83 < 0.0001 CODES1 687.342 < 0.0001 CODEP1 193.132 < 0.0001 DF 9 6 9

Results in Table #.1 showed that all matching variables were significantly associated with their respective partial nonresponse variables. Table #.2. Measures of Association
INCOME VARIABLE Phi-Coefficient Cramer's V Contingency Test PROVINCE 0.192 0.111 0.188 CODES1 0.386 0.273 0.36 CODEP1 0.295 0.17 0.283

EXPENDITURE VARIABLE PROVINCE Phi-Coefficient 0.183 Cramer's V 0.105 Contingency Test 0.18

CODES1 0.408 0.288 0.378

CODEP1 0.216 0.125 0.211

Results in Table #.2 showed that all of the measures of association between the newly categorized Education Status (CODES1) and its respective partial nonresponse variables topped against the other matching variables. However, none of the matching variables have a strong association with the nonresponse variables. This is not taken serious primarily because in real complex data, the variables really have a weak association or sometimes even have no association at all. The minimum percentage of association required for this study is twenty percent for all the tests. In all the tests of association, only CODES1 have at least twenty percent to the partial nonresponse variables.