You are on page 1of 6

Chapter 5 Results and Discussion

5.1 Descriptive Statistics of Second Visit Data Variables

Table shows the descriptive statistics of the second visit variables of interests (VI), TOTEX2 and TOTIN2. This was computed to provide a brief idea on how much a household spends and earns in a period of time, measure the differences of the statistics between the two variables and to compare the results with other tests later on. This descriptive statistics will be also used in comparing the results of imputation classes (IC), how well the observations are grouped.

Table : Descriptive Statistics using the complete data set

The average total spending of a household in the National Capital Region (NCR) is about Php 102389.80 while the average total earnings amounted to P134119.40, a difference of more than thirty thousand pesos. Observations from the TOTIN2 are larger and more spread than the TOTEX2 because of a larger mean and standard deviation respectively. The dispersion can be also seen by just looking at the minimum at maximum of the two variables. The difference between the maximum and minimum of TOTIN2 which measured more than four million against the range of TOTEX2 measured one million lower than TOTIN2 can be also a sign of the extreme variability of the observations.


Formation of Imputation Classes

Table shows the results of the chi-square test where it was done to determine if the candidate matching variables (MV) are associated with the VIs. The MV stated in the methodology must be highly correlated to the variables of interest. The first visit variables of interest, TOTIN1 and TOTEX1, were grouped into four categories in order to satisfy the assumptions in the association tests. TOTIN1 and TOTEX1 were used in as the variables to be tested for association rather than second visit variables of interest since the second visit VIs already contained missing data.

The following candidate matching variables that were tested are the provincial area codes (PROV), recoded education status (CODES1) and recoded total employed household members (CODEP1). The PROV has four categories, CODES1 has three, and CODEP1 has also four. Originally, CODES1 and CODEP1 have more than what they have now. Since the original matching variables have numerous categories (i.e. In CODES1 and CODEP1, there were more than 60 and 7 categories respectively.), the matching variables were recoded and further categorized into smaller groups.

The final categories for CODES1 are as follows: IC1 = At most a high school graduate IC2 = At most a college graduate and other course that where not specified IC3 = Taking Masters and Doctoral Degrees

Table : Tests of Association for Matching Variable: The Chi-Square Test of Independence

Note: Values below the χ2 statistics are the p-values. The Chi-Squared test of association for the candidates and the variables of interest showed that PROV, CODES1 and CODEP1 are associated to CODIN1 and CODEX1. The p-values for all the candidates were very significant. The results of succeeding tests of association will determine which of the three candidates will be chosen as the MV of the study.

Table shows the other tests of association, namely, the Phi-Coefficient, Cramers V and the contingency test. These tests were done in order to assess the degree of association of the candidates to CODIN1 and CODEX1. Table : Tests of Association for Matching Variable: Degree of Association

The table above displays the degree of association between the candidates and the variables of interest. The degree of association for all the tests showed weak association. In real complex data, the association between variable happens to be smaller or even no association at all. In all the other tests of association, only CODES1 measured at a minimum of 20% to be used in dividing the data into imputation classes. The matching variable for this study is the CODES1 variable.

To have a detailed description of the CODES1 imputation classes, the descriptive statistics for each imputation class was performed. Table 5 shows the descriptive statistics of each imputation class of the data. The descriptive statistics will tell if the best MV decreases the variability of the observations. In checking for the variability of each imputation class, the standard deviation will be used and compared with the value from the overall standard deviation of the variables of interest.

Table : Descriptive Statistics of the Data Grouped into Imputation Classes

The table shown above that in the IC1 for both VIs, the first IC which has the largest number of observations produced lesser spread than the two ICs. The two ICs, IC2 and

IC3 produced large standard deviations however it is being neutralized by a low value from IC1 which has the largest proportion of the data. It may be that reason why the standard deviation and the mean of IC3 are large because majority of the extreme values were contained on that class.

5.2.1 Mean of the Simulated Data by Nonresponse Rate for Each Variable of Interest
Table shows the result of the means in both VIs under the varying rates of nonresponse. This was generated to have a brief description on the effects on nonresponse rate on the population mean ignoring the missing values. More importantly, the results below will become input in the comparison of the estimates from the imputed data for each imputation method (IM).

Table : Means of the Retained and Deleted Observations

The mean of the observations set to nonresponse and observations retained showed contrasting results. When the nonresponse rate gets larger for both sets, the mean of observations set to nonresponse increases. Conversely, the mean of observations set to

nonresponse decreases when nonresponse rate increases. It’s a possibility that large values were set to nonresponse that increased the means of the data sets containing nonresponse for the varying rates of nonresponse. Comparing the means for the varying nonresponse rates under each VI, the results showed that there is little difference between the population mean ignoring the missing data and the population mean of the actual data. However, similar to the description above, as the number of missing values increase, the deviation between the means of the actual and retained data slowly increases.