You are on page 1of 15

Locally Weighted Linear Regression

• Problem with linear regression – tends to underfit the data.


• Gives the lowest mean-squared error for unbiased estimators.
• With the model underfit, we aren’t getting the best predictions.
• There are a number of ways to reduce this mean squared error by
adding some bias into our estimator.
• One way to reduce the mean-squared error is a technique known as
locally weighted linear regression (LWLR).
• Locally weighted regression is a very powerful non-parametric model
used in statistical learning.
Locally Weighted Linear Regression
Linear Regression
• Given a dataset X, y, we attempt to find a linear model h(x) that
minimizes residual sum of squared errors. The solution is given
by Normal equations.
• Linear model can only fit a straight line, however, it can be
empowered by polynomial features to get more powerful models.
Still, we have to decide and fix the number and types of features
ahead.
Locally Weighted Linear Regression
• Given a dataset X, y, we attempt to find a model h(x) that
minimizes residual sum of weighted squared errors.
• The weights are given by a kernel function which can be chosen
arbitrarily and here Gaussian kernel.
• The solution is very similar to Normal equations, we only need to
insert diagonal weight matrix W.
• What is interesting about this particular setup? By adjusting meta-
parameter τ you can get a non-linear model that is as strong as
polynomial regression of any degree.
Locally Weighted Linear Regression
• In LWLR, give a weight to data points near our data point of interest;
then we compute a least-squares regression similar to Linear
Regression.
• This type of regression uses the dataset each time a calculation is
needed. The solution is now given by:

• where W is a matrix that’s used to weight the data points.


Locally Weighted Linear Regression
• Uses a kernel to weight nearby points more heavily than other points.
• Use any kernel you like. The most common kernel to use is a
Gaussian.
• The kernel assigns a weight given by:

• This builds the weight matrix W, which has only diagonal elements.
• The closer the data point x is to the other points, the larger w(i,i) will
be.
Locally Weighted Linear Regression
• There also is a user-defined constant k that will determine how much
to weight nearby points. This is the only parameter that we have to
worry about with LWLR.
• You can see how different values of k change the weights matrix in
figure.
That’s All for the Day
Machine Learning
Useful Resource on LWLR
• Application of locally weighted regression-based approach
in correcting erroneous individual vehicle speed data
• Locally Weighted Regression: An Approach to Regression Analysis by L
ocal Fitting
Student Task
• Applications of Locally Weighted Linear Regression (Loess/lowess).
Logistic Regression
• Some regression algorithms can be used for classification as well (and
vice versa).
• Also called Logit Regression
• Used to estimate the probability that an instance belongs to a
particular class (e.g., what is the probability that this email is spam?).
• If the estimated probability is greater than 50%, then the model
predicts that the instance belongs - positive class, labeled “1”.
• or else it predicts that it does not - negative class, labeled “0”.
• This makes it a binary classifier.
Logistic Regression
• We’d like to have an equation we can give all of our features and it
will predict the class.
• In the two-class case, the function will spit out a 0 or a 1.
• The problem with the Heaviside step function is that at the point
where it steps from 0 to 1, it does so instantly. This instantaneous step
is sometimes difficult to deal with.
• There’s another function that behaves in a similar fashion, but it’s
much easier to deal with mathematically. This function is called the
sigmoid.
• The sigmoid is given by the following equation:
Logistic Regression
Logistic Regression
• For the logistic regression classifier we’ll take our features and
multiply each one by a weight and then add them up.
• This result will be put into the sigmoid, and we’ll get a number
between 0 and 1.
• Anything >= 0.5 classify as a 1, and anything < 0.5 classify as a 0.
• Think of logistic regression as a probability estimate.
• The questions are: what are the best weights, or regression
coefficients to use, and how do we find them?
That’s All for the Day

You might also like