Professional Documents
Culture Documents
Locality Filtering
PageRank, Recommen
sensitive data SVM
SimRank der systems
hashing streams
Dimensiona Duplicate
Spam Queries on Perceptron,
lity document
Detection streams kNN
reduction detection
Main question:
How to efficiently train
(build a model/find model parameters)?
k=9
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 7
Kernel Regression
Distance metric:
Euclidean
wi
How many neighbors to look at?
All of them (!)
Weighting function:
d(xi, q) = 0
p
q
nigeria
wx=0
Overfitting:
Mediocre generalization:
Finds a “barely” separating
solution