You are on page 1of 25

LinearModelsforRegression

CS534

AdaptedfromslidesofC.Bishop
PredictionProblems
Predicthousingpricebasedon
Housesize,lotsize,Location,#ofrooms
Predictstockpricebasedon
Pricehistoryofthepastmonth
Predicttheabundanceofaspeciesbasedon
Environmentalconditions
Generalsetup:
Givenasetoftrainingexamples(xi,ti),i =1,N
Goal:learnafunction y (x) tominimizesome
lossfunction: L( y , t )
Example:PolynomialCurveFitting
LinearBasisFunctionModels(1)
Generally

where sareknownasbasisfunctions.
Typically sothat actsasabias.
Inthesimplestcase,weuselinearbasis
functions:
Multiplelinearregression
LinearBasisFunctionModels(2)
Polynomialbasis
functions:

Theseareglobal;asmall
changeinxaffectallbasis
functions.
LinearBasisFunctionModels(3)
Gaussianbasisfunctions:

Thesearelocal;asmall
changeinxonlyaffect
nearbybasisfunctions.
andscontrolthelocation
andscale(width).
LinearBasisFunctionModels(4)
Sigmoidalbasisfunctions:

where

Alsothesearelocal;a
smallchangeinxonly
affectnearbybasis
functions. andscontrol
locationandscale(slope).
MaximumLikelihoodEstimationofw
Assumption: observationsdrawnfroma
deterministicfunctionwithaddedGaussiannoise:
where
whichisthesameassaying,
1
exp (t y (x, w )) 2
2 2
Givenasetofobservedinputs,,and
theircorrespondingtargets,,ifwe
assumethattheparameterstakespecificvaluesw
and thelikelihoodofobservingthedatais:s

i.i.d.assumption
MaximumLikelihoodEstimation
Takingthelogarithmofthelikelihoodftn,weget
ln t w, ln |w xn ,

ln w
2 2 2
Notethat argmax ln t w, argmin w
w w

iscalledtheleastsquareobjective.
Maximizinglikelihood=leastsquares
MaximumLikelihoodEstimation
Computingthegradientandsettingittozeroyields

Solvingforw,weget TheMoorePenrose
pseudoinverse,.

where
Asingleexample

NxM

Abasisfunction
MaximumLikelihoodandLeast
Squares(4)
Maximizingwithrespecttothebias,w0,alone,we
seethat

Wecanalsomaximizewithrespectto giving
SystemEquationViewofLinear
Regression

Overconstrainedsystemofequations
Thereexistsnosolution
MaximumlikelihoodandLeastsquaredsolution
GeometryofLeastSquares
Consider

Ndimensional
Mdimensional

tisandvector
Sisthespacespannedby
and
wML minimizesthedistance
betweent andSbyfinding
theprojectionoft ontoS
Overfittingissue
Whatcanwedotocurb
overfitting
Uselesscomplexmodel
Usemoretraining
examples
Regularization
RegularizedLeastSquares(1)
Considertheerrorfunction:

Dataterm+Regularizationterm(penalizecomplex
models)
Withthesumofsquareserrorfunctionanda
quadraticregularizer,weget
Encouragesmall
weightvalues

iscalledthe
regularization
whichisminimizedby coefficient.
RegularizedLeastSquares(2)
Withamoregeneralregularizer,wehave

Lasso Quadratic
RegularizedLeastSquares(3)
Lassotendstogeneratesparsersolutions(majorityof
theweightsshrinktozero)thanaquadraticregularizer.
TheBiasVarianceDecomposition(1)
Considertheexpectedsquaredloss,

where

ThesecondtermofE[L]correspondstothenoise
inherentintherandomvariablet.
Whataboutthefirstterm?
TheBiasVarianceDecomposition(2)
Supposeweweregivenmultipledatasets,eachof
sizeN.Anyparticulardataset,D,willgiveaparticular
functiony(x;D).Wethenhave
TheBiasVarianceDecomposition(3)
TakingtheexpectationoverDyields
TheBiasVarianceDecomposition(4)
Thuswecanwrite

where
TheBiasVarianceDecomposition(5)
Example:25datasetsfromthesinusoidal,varying
thedegreeofregularization, .
TheBiasVarianceDecomposition(6)
Example:25datasetsfromthesinusoidal,varying
thedegreeofregularization, .
TheBiasVarianceDecomposition(7)
Example:25datasetsfromthesinusoidal,varying
thedegreeofregularization, .
TheBiasVarianceTradeoff
Fromtheseplots,we
notethatanover
regularizedmodel(large
)willhaveahighbias,
whileanunder
regularizedmodel(small
)willhaveahigh
variance.