Package ‘kknn’

December 16, 2013
Title Weighted k-Nearest Neighbors Version 1.2-3 Date 2013-12-16 Author Klaus Schliep & Klaus Hechenbichler Description Weighted k-Nearest Neighbors Classification, Regression and Clustering Maintainer Klaus Schliep <klaus.schliep@gmail.com> Depends R (>= 2.10) Imports igraph (>= 0.6), Matrix, stats License GPL (>= 2) NeedsCompilation yes Repository CRAN Date/Publication 2013-12-16 12:58:53

R topics documented:
kknn-package contr.dummy glass . . . . . ionosphere . . kknn . . . . . miete . . . . simulation . . specClust . . train.kknn . . Index 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . 2 . 3 . 4 . 5 . 7 . 9 . 10 . 12 15

schliep@gmail.metric(n. PhD-thesis contr.dummy kknn-package Weighted k-Nearest Neighbors Classification and Clustering Description Weighted k-Nearest Neighbors Classification. (2004) Weighted k-Nearest-Neighbor Techniques and Ordinal Classification.P.com> References Hechenbichler K.ordinal(n.dummy Contrast Matrices Description Returns a matrix of contrasts. contrasts = TRUE) contr. contr.ps) Hechenbichler K.numeric (makes sense of course only for ordered variables). Regression and spectral Clustering The complete list of functions can be displayed with library(help = kknn). (2005) Ensemble-Techniken und ordinale Klassifikation. contrasts = TRUE) contr. Author(s) Klaus Schliep Klaus Hechenbichler Maintainer: Klaus Schliep <klaus.stat. or the number of levels. contrasts = TRUE) Arguments n contrasts Details contr. contr.dummy(n. Usage contr.dummy is standard dummy-coding.2 contr. Discussion Paper 399.uni-muenchen. and Schliep K. A vector containing levels of a factor. SFB 386.ordinal computes contrasts for ordinal variables.de/sfb386/papers/dsp/paper399. . A logical value indicating whether contrasts should be computed. Ludwig-Maximilians University Munich (http: //www.metric has the same effect like as.

and Schliep K.uni-muenchen.com> References Hechenbichler K. At the scene of the crime. Al Aluminum. a matrix with n rows and n columns for contr. (2004) Weighted k-Nearest-Neighbor Techniques and Ordinal Classification..e.glass Value 3 A matrix with n rows and n-1 columns for contr.de/sfb386/papers/dsp/paper399.poly and contr.schliep@gmail. K. The study of classification of types of glass was motivated by criminological investigation.metric.. contr.ordinal. etc).dummy and a vector of length n for contr. Na.ps) See Also contrasts. as are attributes 4-10). Author(s) Klaus P.dummy(5) glass Glass Identification Database Description A data frame with 214 observations. where the problem is to predict the type of glass in terms of their oxide content (i. Ludwig-Maximilians University Munich (http: //www. Fe. Schliep <klaus. Na Sodium (unit measurement: weight percent in corresponding oxide. RI Refractive index.P. if it is correctly identified! Usage data(glass) Format A data frame with 214 observations on the following 11 variables.metric(5) contr. Discussion Paper 399.stat. Id Id number.sdif Examples contr. SFB 386. the glass left can be used as evidence. .ordinal(5) contr. Mg Magnesium.

Reading.4 Si Silicon. Received signals were processed using an autocorrelation function whose arguments are the time of a pulse and the pulse number. Ph. corresponding to the complex values returned by the function resulting from the complex electromagnetic signal. "Bad" returns are those that do not.com>. Labrador. See the paper for more details. German.D. The targets were free electrons in the ionosphere. "Good" radar returns are those showing evidence of some type of structure in the ionosphere. Instances in this database are described by 2 attributes per pulse number. Usage data(ionosphere) . Type Type of glass: (class attribute) 1 building windows float processed 2 building windows non float processed 3 vehicle windows float processed 4 vehicle windows non float processed (none in this database) 5 containers 6 tableware 7 headlamps Source ionosphere • Creator: B.ics. Ca Calcium. Diagnostic Products Corporation The data have been taken from the UCI Machine Learning Database Repository http://www. Ba Barium.. K Potassium. Fe Iron. There were 17 pulse numbers for the Goose Bay system. This system consists of a phased array of 16 high-frequency antennas with a total transmitted power on the order of 6. Examples data(glass) str(glass) ionosphere Johns Hopkins University Ionosphere Database Description This radar data was collected by a system in Goose Bay.edu/~mlearn/MLRepository. Aldermaston. Berkshire RG7 4PN • Donor: Vina Spiehler.schliep@gmail. DABFT.uci. Home Office Forensic Science Service. Central Research Establishment. their signals pass through the ionosphere.html and were converted to R format by <klaus.4 kilowatts.

"rank" and "optimal". Source Vince Sigillito (vgs@aplcen.4)). Window width of an y-kernel. The 35th attribute is either g ("good") or b ("bad") according to the definition summarized above.edu). scale variable to have equal sd.jhu.html and were converted to R format by <klaus. Matrix or data frame of test set cases. Kernel to use. Applied Physics Laboratory. Matrix or data frame of training set cases.2)). "inv". Usage kknn(formula = formula(train). scale=TRUE. ykernel scale contrasts . The first 34 continuous covariables are used for the prediction.uci. "triweight" (or beta(4.action = na. "biweight" (or beta(3. na.3)). "epanechnikov" (or beta(2. "gaussian". k = 7. train. test. contrasts = c( unordered = "contr. In addition even ordinal and continuous variables can be predicted.omit().ics. "triangular". Laurel.schliep@gmail. Space Physics Group. Examples data(ionosphere) kknn Weighted k-Nearest Neighbor Classifier Description Performs k-nearest neighbor classification of a test set using a training set.kknn Format 5 A data frame with 351 observations on the following 35 variables.ordinal")) Arguments formula train test na.dummy".com >. A vector containing the ’unordered’ and ’ordered’ contrasts to use. "cos". distance = 2. This is a binary classification task. Johns Hopkins University.edu/~mlearn/MLRepository. ykernel = NULL. For each row of the test set. kernel = "optimal". Johns Hopkins Road. especially for prediction of ordinal classes. MD 20723 The data have been taken from the UCI Machine Learning Database Repository http://www.action k distance kernel A formula object. A function which indicates what should happen when the data contain ’NA’s. logical.apl. Possible choices are "rectangular" (which is standard unweighted knn). Parameter of Minkowski distance. the k nearest training set vectors (according to Minkowski distance) are found. and the classification is done via the maximum of summed kernel densities. Number of neighbors considered. ordered = "contr.

schliep@gmail. Matrix of predicted class probabilities. nominal or ordinal. Matrix of distances of the k nearest neighbors. knn and knn1 Vector of predictions. The matched call. (2005) Ensemble-Techniken und ordinale Klassifikation.uni-muenchen.statslab. kernel="rectangular". 2733-2763.e.uk/~rjs57/Research. not only kernel functions but every monotonic decreasing function f (x)∀x > 0 will work fine. where k is the number that would be used for unweighted knn classification. First it can be used not only for classification. (2004) Weighted k-Nearest-Neighbor Techniques and Ordinal Classification.ps) Hechenbichler K.cam. and Schliep K.com> Klaus Hechenbichler References Hechenbichler K. but also for regression and ordinal classification.J. Annals of Statistics. Matrix of classes of the k nearest neighbors. 40. (avaialble from http://www. The number of neighbours used for the "optimal" kernel should be [(2(d + 4)/(d + 2))( d/(d + 4))k ]. Second it uses kernel functions to weight the neighbors according to their distances. Matrix of weights of the k nearest neighbors. Parameter of Minkowski distance. i. R. PhD-thesis Samworth. one of continuous.ac.stat. . Type of response variable. SFB 386.values CL W D prob response distance call terms Author(s) Klaus P. Schliep <klaus. Ludwig-Maximilians University Munich (http: //www.html) See Also train.2 and 2 (see Samworth (2012) for more details).6 Details kknn This nearest neighbor method expands knn in several directions.kknn.P. This factor (2(d + 4)/(d + 2))( d/(d + 4)) is between 1. Value kknn returns a list-object of class kknn including the components fitted. The ’terms’ object used. In fact. Discussion Paper 399.de/sfb386/papers/dsp/paper399. simulation. (2012) Optimal weighted nearest neighbour classifiers.

"epanechnikov". iris.. "optimal").valid[1:4]. Usage data(miete) . fit. kmax = 15. replace = FALSE.as.valid <.valid).learn.] ionosphere.kknn) table(iris.learn.kknn(class ~ . m)) iris.train1 <..learn <.numeric(iris.iris[-val.learn. ionosphere. ionosphere. The rent standards are used in particular for the determination of the local comparative rent (i. kernel = c("triangular".valid$class) (fit. ionosphere.valid$Species)) pairs(iris. pch = pcol.] fit. col = c("green3".train2 <. "rectangular".] iris. "red") [(iris.kknn(class ~ . "epanechnikov". landlords.character(as.train. equipment. distance = 1..learn <.ionosphere[1:200. "optimal").train2. "rectangular".fitted(iris. size = round(m/3).kknn <. fit) pcol <. For the composition of the rent standards.kknn <. distance = 2)) table(predict(fit. iris.valid$Species != fit)+1]) data(ionosphere) ionosphere. a representative random sample is drawn from all relevant households. ionosphere.e. renting advisory boards and experts.valid$class.valid).miete Examples library(kknn) data(iris) m <. year of construction.ionosphere[-c(1:200)..valid <.valid.kknn) fit <. ionosphere.dim(iris)[1] val <. net rent as a function of household size.] iris. kmax = 15. distance = 1)) table(predict(fit. and the interesting data are determined by interviewers by means of questionnaires. kernel = c("triangular".valid$class) 7 miete Munich Rent Standard Database (1994) Description Many german cities compose so-called rent standards to make a decision making instrument available to tenants. kernel = "triangular") summary(iris.valid$Species.kknn(Species~.valid) table(ionosphere.learn. ionosphere.iris[val. ionosphere.kknn$fit) (fit. The dataset contains the data of 1082 households interviewed for the munich rent standard 1994.). ionosphere. etc. prob = rep(1/m.sample(1:m.train.kknn(class ~ .train1.

nm Net rent in DM. rooms Number of rooms in household. bjkat Age category of the building (bj categorized) 1 : built before 1919 2 : built between 1919 and 1948 3 : built between 1949 and 1965 4 : built between 1966 and 1977 5 : built between 1978 and 1983 6 : built after 1983 wflkat Floor space category (wfl categorized): 1 : less than 50 sqm 2 : between 51 sqm and 80 sqm 3 : at least 81 sqm nmqm Net rent per sqm. bad0 Bathroom in apartment? 1 : no 0 : yes zh Central heating? 1 : yes 0 : no ww0 Hot water supply? 1 : no 0 : yes badkach Tiled bathroom? 1 : yes 0 : no fenster Window type: 1 : plain windows 0 : state-of-the-art windows kueche Kitchen type 1 : well equipped kitchen 0 : plain kitchen mvdauer Lease duration in years. wfl Floor space in sqm. bj Year of construction.8 Format A data frame with 1082 observations on the following 18 variables. nmkat Net rent category (nm categorized): 1 : less than 500 DM 2 : between 500 DM and 675 DM 3 : between 675 DM and 850 DM 4 : between 850 DM and 1150 DM 5 : at least 1150 DM miete .

data. Springer. . G. Number of crossvalidation runs. If TRUE the training procedure for selecting optimal values of k and kernel is performed. L. Matrix or data frame. runs = 10. the absolute and the squared distances. Berlin. I.. R. A logical value. Number or maximal number of neighbors considered.schliep@gmail. ..simulation adr Address type: 1 : bad 2 : average 3 : good wohn Residential type: 1 : bad 2 : average 3 : good Source 9 Fahrmeir.com>. und Tutz. http://www. A formula object. containing the mean and variance of the misclassification error.) Arguments formula data runs train k . train = TRUE.uni-muenchen.. Further arguments passed to or from other methods. Value A matrix. Usage simulation(formula.. Examples data(miete) str(miete) simulation Crossvalidation procedure to test prediction accuracy Description simulation tests prediction accuracy of regression and/or classification techniques via simulation of different test sets.. (1997): Statistik: der Weg zur Datenanalyse.de/service/datenarchiv The data were converted to R format by <klaus. k = 11.stat.. Kuenstler. Pigeot. dependent of choice for train.

Ludwig-Maximilians University Munich (http: //www. runs = 5) simulation(zh ~ wfl + bjkat + nmqm. data = miete. runs = 5. an object of class specClust Further arguments passed to or from other methods.de/sfb386/papers/dsp/paper399.. Number of neighbors considered. (2004) Weighted k-Nearest-Neighbor Techniques and Ordinal Classification.stat. Normalisation of the Laplacian ("none". SFB 386.com> References specClust Hechenbichler K.. and Schliep K. data = miete. kernel = "triangular".ps) See Also kknn and train.) Arguments data centers nn method gmax x .kknn Examples library(kknn) data(miete) simulation(nmqm ~ wfl + bjkat + zh.P. gmax=NULL. centers=NULL.uni-muenchen. maximal number of connected components. nn = 7. method = "symmetric". data = miete. . runs = 5) specClust Spectral Clustering Description Spectral clustering based on k-nearest neighbor graph. Discussion Paper 399.. "symmetric" or "random-walk").10 Author(s) Klaus P.) ## S3 method for class specClust plot(x. . if NULL the number is chosen automatical.. .schliep@gmail. Usage specClust(data. Schliep <klaus. Matrix or data frame.. k = 15) simulation(wflkat ~ nm + bjkat + zh.. number of clusters to estimate.

numeric(iris$Species)) pairs(iris[1:4]. J. Stat Comput.character(as.) Advances in Neural Information Processing Systems. kmeans Examples data(iris) cl <. 17. IEEE Transactions on Pattern Analysis and Machine Intelligence. (2000). Eighteenth Annual Conference on Neural Information Processing Systems. The Laplacian is sparse and has roughly n*m elements and only k eigenvectors are computed.. The eigenvalues and eigenvectors are computed using the binding in igraph to arpack. Author(s) Klaus P. (eds. 395–416 Ng. M. Z. Perona (2004) Self-Tuning Spectral Clustering. 14. Weiss..1:4]. Cambridge Lihi Zelnik-Manor and P. where n is the number of observations and m the number of nearest neighbors. pch = pcol. S.specClust Details 11 specClust alllows to estimate several popular spectral clustering algorithms. T.. Y. 888–905 See Also kknn.schliep@gmail. (NIPS) Shi. 3. The Laplacian is constructed from a from nearest neighbors and there are several kernels available. Schliep <klaus. "red".com> References U. col = c("green". 22 (8). Ghahramani. Becker. cl$cluster) . Jordan. for an overview see von Luxburg (2007). "blue")[cl$cluster]) table(iris[. Normalized cuts and image segmentation. nn=5) pcol <. This should ensure that this algorithm is also feasable for larger datasets as the the the distances used have dimension n*m. (2002) On spectral clustering: analysis and an algorithm. Value specClust returns a kmeans object or in case of k being a vector a list of kmeans objects. von Luxburg (2007) A tutorial on spectral clustering. A.5].as. arpack. In: Dietterich. where k is the number of centers. 849–856. and Malik. MIT Press..specClust(iris[. J.

2)). scale variable to have equal sd. "cos". "inv".ordinal"). Kernel to use. . "triweight" (or beta(4. Matrix or data frame. contrasts = c( unordered = "contr. distance = 2. Possible choices are "rectangular" (which is standard unweighted knn). A vector containing the ’unordered’ and ’ordered’ contrasts to use.kknn(formula.. "triangular"..kknn Training kknn Description Training of kknn method via leave-one-out crossvalidation. .values Matrix of misclassification errors.kknn(formula. Matrix of mean absolute errors. "gaussian" and "optimal". Number of partitions for k-fold cross validation. "epanechnikov" (or beta(2. kcv = 10.ABS MEAN. Value train. kernel = "optimal". ykernel scale contrasts .) Arguments formula data kmax distance kernel A formula object. Parameter of Minkowski distance. kcv Details train. ykernel = NULL. Window width of an y-kernel. cv.dummy". scale = TRUE. especially for prediction of ordinal classes.SQU fitted.kknn returns a list-object of class train. MISCLASS MEAN.kknn performs leave-one-out crossvalidation and is computatioanlly very efficient.kknn train. ordered = "contr. .kknn including the components. Further arguments passed to or from other methods.kknn performs k-fold crossvalidation and is generally slower and does not yet contain the test of different models yet.4)).) cv. List of predictions for all combinations of kernel and k. "biweight" (or beta(3... logical. Usage train.12 train. data. kmax = 11. Matrix of mean squared errors.. data. Maximum number of k.3))..

J. "rank".ac. "optimal"))) plot(train.de/sfb386/papers/dsp/paper399. Ludwig-Maximilians University Munich (http: //www.kknn(wflkat ~ nm + bjkat + zh. kernel = c("rectangular".nom) ## End(Not run) data(glass) glass <. kmax = 25. data = miete.ps) Hechenbichler K.train. (avaialble from http://www. (2004) Weighted k-Nearest-Neighbor Techniques and Ordinal Classification. distance = 2)) .kknn(Type c("triangular". "optimal"). and Schliep K.ord) (train. "gaussian".nom <.. 40.P.. 13 Hechenbichler K.stat.statslab. SFB 386. kmax = 15.glass1 <. nominal or ordinal.con) (train. R. "triangular". "triangular". distance = 1)) ~ . "epanechnikov". "optimal"))) plot(train.kknn(Type c("triangular". plot(fit. "triangular". miete.kknn(zh ~ wfl + bjkat + nmqm.cam.-1] (fit. "rectangular". "optimal"))) plot(train.schliep@gmail. (2005) Ensemble-Techniken und ordinale Klassifikation. (fit. (2012) Optimal weighted nearest neighbour classifiers.kknn best. glass.uk/~rjs57/Research.train. "rank". kmax = 25.uni-muenchen. "rank".train. "epanechnikov". kmax = 15. one of continuous. kernel = c("rectangular". kmax = 25. response distance call terms Author(s) Klaus P. "gaussian". kernel = "epanechnikov".com> References Type of response variable.glass2 <. glass. "epanechnikov".glass[.train. miete.con <.train.kknn(nmqm ~ wfl + bjkat + zh. PhD-thesis Samworth. Annals of Statistics. Parameter of Minkowski distance.glass1) ~ . kernel = "epanechnikov". "gaussian". 2733-2763. "rectangular". "optimal").train. Schliep <klaus. Discussion Paper 399.ord <. kernel = c("rectangular". The matched call.html) See Also kknn and simulation Examples library(kknn) ## Not run: data(miete) (train.parameters List containing the best parameter value for kernel and k. The ’terms’ object used.

kknn .glass2) train.14 plot(fit.

train. 12 ∗Topic cluster specClust. 2 ∗Topic package kknn-package. 11 contr. 3 ionosphere. 10. 6.specClust (specClust). 5.kknn (train.sdif. 12 .dummy.kknn).kknn. 10 ∗Topic datasets glass. 10 summary. 2 contr. 11.kknn). 5 15 predict.kknn).ordinal (contr. 4 kknn. 13 kknn-package.Index ∗Topic classif contr.kknn (train.kknn. 2 kmeans. 2 contr. 9 train. 3 contr.kknn (kknn). 7 plot. 9.kknn). 5 print. 10.kknn (train. 13 specClust.kknn (train.kknn (kknn).metric (contr. 12 train. 12 print. 4 miete.kknn (kknn). 2 arpack. 10 plot. 6 miete.dummy).kknn (train. 12 glass. 2 contr.train. 3 cv. 2 kknn. 3 ionosphere.train. 12 predict. 11 knn.kknn). 6 knn1. 12 simulation.dummy). 7 ∗Topic design contr.poly. 5 summary.train.dummy. 5 simulation. 3 contrasts. 6.dummy.