You are on page 1of 2

Imputation

Editing is of little value to the overall improvement of the actual survey results, if no corrective action is taken when items fail to follow the rules set out during the editing process. When all of the data have been edited using the applied rules and a file is found to have missing data, then imputation is usually done as a separate step. Non-response and invalid data definitely impact the quality of the survey results. Imputation resolves the problems of missing, invalid or incomplete responses identified during editing, as well as any editing errors that might have occurred. At this stage, all of the data are screened for errors because respondents are not the only ones capable of making mistakes; errors can also occur during coding and editing. Imputation procedures are designed to fill in the gaps. So, changes are made to the minimum number of fields until the completed record passes all of the edits. When these errors are detected, values for invalid, missing or incomplete entries are imputed or replaced with appropriate values, and answers are provided for non-response questions. This procedure is best accomplished by those with full access to the microdata and in possession of good auxiliary information. The imputation procedures are decided upon during the planning and development stages of a survey. Some problems are eliminated earlier through contact with the respondent or by manually studying the questionnaire, but it is generally impossible to resolve all problems due to concerns of response burden, cost and timeliness. Thus, the imputation procedure is used to handle the remaining edit failures. Although imputation can improve the quality of the final data, care must be taken to choose an appropriate imputation methodology. Some methods of imputation do not preserve the relationship between variables. In fact, some can actually distort the underlying distributions. There are several approaches to consider when imputing data. Usually, deductive imputation is the first method used. This method can be completed during the collection, capture, editing, or later stages of data processing. Deductive imputation is used when there is only one possible response to the question (e.g., all the values are given but the total or subtotal is missing). Some other types of imputation methods include:

substitution relies on the availability of comparable data. Imputed data can be extracted from the respondent's record from a previous cycle of the survey, or the imputed data can be taken from the respondent's alternative source file (e.g. administrative files or other survey files for the same respondent). This is often difficult to do because, in many cases, there is no other information available than the information provided in the current survey. estimator uses information from other questions or from other answers (from the current cycle or a previous cycle), and through mathematical operations, derives a plausible value for the missing or incorrect field.

The simplest of the estimator methods is the mean imputation. With this approach, a missing field is filled with the average value from the responding units with the same set of predetermined characteristics. For example, if a record is missing a total number for an individual's yearly income, then we could impute the recorded average income in that individual's province for the same occupation with the same level of experience as the respondent. There are other, more sophisticated estimator methods available.

cold deck makes use of a fixed set of values, which covers all of the data items. These values can be constructed with the use of historical data, subject-matter expertise, etc. A 'perfect' questionnaire is created in order to answer complete or partial imputation requirements. hot deck uses other records as 'donors' in order to answer the question (or set of questions) that needs imputation. The donor can be randomly selected from a pool of donors with the same set of predetermined characteristics. For example, if a questionnaire has been returned with the yearly income missing, then we could determine donor characteristics as records with the same province, same occupation and same amount of experience as the respondent from the survey requiring imputation. A list of possible donors matching these criteria is created and one of them is randomly selected. Once a donor is found, the donor response (in this case, the yearly income) replaces the missing or invalid response. The donor can also be found through a method called nearest neighbour imputation. In this case, some sort of criteria must be developed to determine which responding unit is 'most like' the unit with the missing value in accordance with the predetermined characteristics. The closest unit to the missing value is then used as the donor. The method of imputation can vary from survey to survey and, depending on unique or particular circumstances, sometimes even within the same survey. These methods can be applied either manually or with the use of an automated system. The imputed value is determined by calling the respondent or is based on the judgment of a subject-matter specialist. To help facilitate this, Statistics Canada has written specialized programs to impute data based on the methodological input of experienced statisticians who have analysed the survey and suggested approaches on how best to impute meaningful data.

Imputation methods can be performed automatically, manually or in combination. Done properly, imputation limits the biases caused by not having a complete and accurate record; contains an audit trail for evaluation purposes; and ensures that the imputed records are internally consistent. A good imputation procedure is automated, objective and efficient.