Professional Documents
Culture Documents
Data Imputation With KNN: E (A, B) X X E (A, B) X X
Data Imputation With KNN: E (A, B) X X E (A, B) X X
The K Nearest Neighbor is the assigning a value based on how nearly it similar the points in the
training set.
The data is imputed with the mean of nearest neighbors.
2
E ( a , b) = √ ∑ (x −x )
i∈ D
ai bi
TodayDayMinutes
E( r 3 , r 1)=√(56−30)2 =26
2 2
E( r 3 , r 2)=√ (56−45 ) + ( 80−40 ) =41.48
E( r 3 , r 4)=√(80−98)2=18
Select the first two values of the ascending Euclidean distance.
The first two values are 100 and 95.
The mean value of these is 97.5.
TotalDayCalls
2
E( r 4 , r 1)= √( 95−100 ) =5
2 2
E( r 4 , r 2)= √( 95−90 ) + ( 98−40 ) =58.21
2 2
E( r 4 , r 3)=√ ( 95−97.5 ) + ( 98−80 ) =18.17
TotalDayCharge
2 2
E( r 1 , r 2)= √( 100−9 0 ) −( 30−45 ) =15.81
2 2
E( r 1 , r 3)=√( 100−97.5 ) + ( 30−56 ) =26.1 1
2 2
E( r 1 , r 4)= √ ( 100−95 ) + ( 30−43 ) =13.92
The selected values are 40 and 98.
The imputed data (mean of neighbors) is 69.
No TotayDayMinutes TotalDayCalls TotalDayCharge
1 100.0 30.0 69
2 90.0 45.0 40.0
3 97.5 56.0 80.0
4 95.0 43.0 98.0