You are on page 1of 3

5/26/2016

KernelmethodWikipedia,thefreeencyclopedia

Kernelmethod
FromWikipedia,thefreeencyclopedia

Inmachinelearning,kernelmethodsareaclassofalgorithmsforpatternanalysis,whosebestknown
memberisthesupportvectormachine(SVM).Thegeneraltaskofpatternanalysisistofindandstudy
generaltypesofrelations(forexampleclusters,rankings,principalcomponents,correlations,
classifications)indatasets.Formanyalgorithmsthatsolvethesetasks,thedatainrawrepresentation
havetobeexplicitlytransformedintofeaturevectorrepresentationsviaauserspecifiedfeaturemap:in
contrast,kernelmethodsrequireonlyauserspecifiedkernel,i.e.,asimilarityfunctionoverpairsofdata
pointsinrawrepresentation.
Kernelmethodsowetheirnametotheuseofkernelfunctions,whichenablethemtooperateinahigh
dimensional,implicitfeaturespacewithoutevercomputingthecoordinatesofthedatainthatspace,but
ratherbysimplycomputingtheinnerproductsbetweentheimagesofallpairsofdatainthefeature
space.Thisoperationisoftencomputationallycheaperthantheexplicitcomputationofthecoordinates.
Thisapproachiscalledthe"kerneltrick".Kernelfunctionshavebeenintroducedforsequencedata,
graphs,text,images,aswellasvectors.
Algorithmscapableofoperatingwithkernelsincludethekernelperceptron,supportvectormachines
(SVM),Gaussianprocesses,principalcomponentsanalysis(PCA),canonicalcorrelationanalysis,ridge
regression,spectralclustering,linearadaptivefiltersandmanyothers.Anylinearmodelcanbeturned
intoanonlinearmodelbyapplyingthekerneltricktothemodel:replacingitsfeatures(predictors)bya
kernelfunction.
Mostkernelalgorithmsarebasedonconvexoptimizationoreigenproblemsandarestatisticallywell
founded.Typically,theirstatisticalpropertiesareanalyzedusingstatisticallearningtheory(forexample,
usingRademachercomplexity).

Contents
1
2
3
4
5
6
7
8

Motivationandinformalexplanation
Mathematics:thekerneltrick
Applications
Popularkernels
Seealso
Notes
References
Externallinks

Motivationandinformalexplanation
Kernelmethodscanbethoughtofasinstancebasedlearners:ratherthanlearningsomefixedsetof
parameterscorrespondingtothefeaturesoftheirinputs,theyinstead"remember"the thtraining
example
andlearnforitacorrespondingweight .Predictionforunlabeledinputs,i.e.,those
notinthetrainingset,istreatedbytheapplicationofasimilarityfunction ,calledakernel,between
theunlabeledinput andeachofthetraininginputs .Forinstance,akernelizedbinaryclassifier
typicallycomputesaweightedsumofsimilarities
https://en.wikipedia.org/wiki/Kernel_method

1/4

5/26/2016

KernelmethodWikipedia,thefreeencyclopedia

,
where
isthekernelizedbinaryclassifier'spredictedlabelfortheunlabeledinput
whosehiddentruelabel isofinterest
isthekernelfunctionthatmeasuressimilaritybetweenanypairofinputs

thesumrangesoverthenlabeledexamples
intheclassifier'strainingset,with

the
aretheweightsforthetrainingexamples,asdeterminedbythelearningalgorithm
thesignfunction
determineswhetherthepredictedclassification comesoutpositiveor
negative.
Kernelclassifiersweredescribedasearlyasthe1960s,withtheinventionofthekernelperceptron.[1]
Theyrosetogreatprominencewiththepopularityofthesupportvectormachine(SVM)inthe1990s,
whentheSVMwasfoundtobecompetitivewithneuralnetworksontaskssuchashandwriting
recognition.

Mathematics:thekerneltrick
Thekerneltrickavoidstheexplicitmappingthatisneededtogetlinearlearningalgorithmstolearna
nonlinearfunctionordecisionboundary.Forall and intheinputspace ,certainfunctions
canbeexpressedasaninnerproductinanotherspace .Thefunction
is
oftenreferredtoasakernelorakernelfunction.Theword"kernel"isusedinmathematicstodenotea
weightingfunctionforaweightedsumorintegral.
Certainproblemsinmachinelearninghaveadditionalstructurethananarbitraryweightingfunction .
Thecomputationismademuchsimplerifthekernelcanbewrittenintheformofa"featuremap"
whichsatisfies

Thekeyrestrictionisthat
mustbeaproperinnerproduct.Ontheotherhand,anexplicit
representationfor isnotnecessary,aslongas isaninnerproductspace.Thealternativefollows
fromMercer'stheorem:animplicitlydefinedfunction existswheneverthespace canbeequipped
withasuitablemeasureensuringthefunction satisfiesMercer'scondition.
Mercer'stheoremisakintoageneralizationoftheresultfromlinearalgebrathatassociatesaninner
producttoanypositivedefinitematrix.Infact,Mercer'sconditioncanbereducedtothissimplercase.If
wechooseasourmeasurethecountingmeasure
forall
,whichcountsthe
numberofpointsinsidetheset ,thentheintegralinMercer'stheoremreducestoasummation

https://en.wikipedia.org/wiki/Kernel_method

2/4

5/26/2016

KernelmethodWikipedia,thefreeencyclopedia

Ifthissummationholdsforallfinitesequencesofpoints
in andallchoicesof real
valuedcoefficients
(cf.positivedefinitekernel),thenthefunction satisfiesMercer's
condition.
Somealgorithmsthatdependonarbitraryrelationshipsinthenativespace would,infact,havea
linearinterpretationinadifferentsetting:therangespaceof .Thelinearinterpretationgivesusinsight
aboutthealgorithm.Furthermore,thereisoftennoneedtocompute directlyduringcomputation,asis
thecasewithsupportvectormachines.Somecitethisrunningtimeshortcutastheprimarybenefit.
Researchersalsouseittojustifythemeaningsandpropertiesofexistingalgorithms.
Theoretically,aGrammatrix

withrespectto

(sometimesalsocalleda

"kernelmatrix"[2]),where
,mustbepositivesemidefinite(PSD).[3]Empirically,for
machinelearningheuristics,choicesofafunction thatdonotsatisfyMercer'sconditionmaystill
performreasonablyif atleastapproximatestheintuitiveideaofsimilarity.[4]Regardlessofwhether
isaMercerkernel, maystillbereferredtoasa"kernel".
Ifthekernelfunction isalsoacovariancefunctionasusedinGaussianprocesses,thentheGram
matrix canalsobecalledacovariancematrix.[5]
Finally,supposethat

isasquarematrix.Then

isapositivesemidefinitematrix.

Applications
Applicationareasofkernelmethodsarediverseandincludegeostatistics,[6]kriging,inversedistance
weighting,3Dreconstruction,bioinformatics,chemoinformatics,informationextractionandhandwriting
recognition.

Popularkernels
Fisherkernel
Graphkernels
Kernelsmoother
Polynomialkernel
RBFkernel
Stringkernels

Seealso
Kernelmethodsforvectoroutput

Notes
1.Aizerman,M.A.Braverman,EmmanuelM.Rozoner,L.I.(1964)."Theoreticalfoundationsofthepotential
functionmethodinpatternrecognitionlearning".AutomationandRemoteControl25:821837.Citedin
Guyon,IsabelleBoser,B.Vapnik,Vladimir(1993).AutomaticcapacitytuningofverylargeVCdimension
classifiers.Advancesinneuralinformationprocessingsystems.CiteSeerX:10.1.1.17.7215.
2.Hofmann,ThomasScholkopf,BernhardSmola,AlexanderJ.(2008)."KernelMethodsinMachine
Learning".
3.Mohri,MehryarRostamizadeh,AfshinTalwalkar,Ameet(2012).FoundationsofMachineLearning.The
https://en.wikipedia.org/wiki/Kernel_method

3/4