Professional Documents
Culture Documents
2. Technical Details
KNN imputation works as follows:
a. For each data point with missing values, KNN identifies 'k' number of data points that are
closest to the point in question. The 'distance' or similarity between data points is typically
computed using metrics like Euclidean distance.
b. The average (or median) of the 'k' neighbors is then computed and used to impute the
missing value for the data point.
It's essential to choose an appropriate 'k' value. A small 'k' can be noisy and susceptible to
outliers, while a large 'k' can be computationally expensive and might smooth out the data
excessively.
Post imputation, the dataset was ready for further analysis, such as PCA, without the
challenges posed by missing data.