Professional Documents
Culture Documents
Slides8 Misc Topics
Slides8 Misc Topics
and
Cross-Validation Methods
Padhraic Smyth
Information and Computer Science
CS 175, Fall 2007
Review of Assignment 3 (Perceptron)
perceptron.m function
% error checking
if size(weights,1) ~= 1
error('The first argument (weights) should be a row vector');
end
if size(data,2) ~= size(weights,2)
error('The arguments (weights and data) should have the same number of
columns');
end
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 3
perceptron_error.m function
function [cerror, mse] = perceptron_error(weights,data,targets)
% function [cerror, mse] = perceptron_error(weights,data,targets)
%
% Compute mis-classification error and mean squared error for
% a perceptron (linear) classifier
% Sample code for CS 175
%
% Inputs
% weights: 1 x (d+1) row vector of weights
% data: N x (d+1) matrix of training data
% targets: N x 1 vector of target values (+1 or -1)
%
% Outputs
% cerror: the percentage of examples misclassified (between 0 and 100)
% mse: the mean-square error (sum of squared errors divided by N)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 4
perceptron_error.m function
N = size(data, 1);
% error checking
if nargin ~= 3
error('The function takes three arguments (weights, data, targets)');
end
if size(weights,1) ~= 1
error('The first argument (weights) should be a row vector');
end
if size(data,2) ~= size(weights,2)
error('The first two arguments (weights and data) should have the same number of
columns');
end
if size(data,1) ~= size(targets,1)
error('The last two arguments (targets and data) should have the same number of rows');
end
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 5
perceptron_error.m function
% calculate the unthresholded outputs, for all rows in data, N x 1 vector
f = (weights * data‘) ‘;
% calculate the sigmoid version of the outputs, for all rows in data, N x 1 vector
outputs = sigmoid(f);
% compare sigmoid output vector to the target vector to get the mse
mse = sum((outputs-targets).^2)/N;
function s = sigmoid(x)
% Computes sigmoid function (scaled to -1, +1)
s = 2./(1+exp(-x)) - 1;
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 6
perceptron_error.m function
% calculate the unthresholded outputs, for all rows in data, N x 1 vector
f = (weights * data‘) ‘;
Vectorized computation of
classification error rate
% compare thresholded output to the target values to get the accuracy
cerror = 100 * sum(sign(f) ~= targets)/N;
Vectorized computation of
% calculate the sigmoid version of the outputs,
sigmoid for all rows in data, N x 1 vector
output
outputs = sigmoid(f);
Vectorized computation of
% compare sigmoid output vector to the target vector to get the mse
MSE
mse = sum((outputs-targets).^2)/N;
function s = sigmoid(x)
% Computes sigmoid function (scaled to -1, +1)
s = 2./(1+exp(-x)) - 1;
Local function defining the
sigmoid. Note that it works
on vectors
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 7
Principle of Gradient Descent
Gradient descent algorithm:
– Start with some initial guess at w
– Move downhill in “small steps” direction of steepest descent
– After moving, recompute the gradient, get a new downhill direction, and move
again.
– Keep repeating this until the decrease in g(w) is less than some threshold, i.e.,
we appear to be on a flat part of the g(w) surface.
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 8
Illustration of Gradient Descent
g(w)
w1
w2
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 9
Illustration of Gradient Descent
g(w)
w1
w2
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 10
Illustration of Gradient Descent
g(w)
w1
Direction of steepest
descent = direction of
negative gradient
w2
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 11
Illustration of Gradient Descent
g(w)
w1
Original point in
weight space
New point in
weight space
w2
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 12
Gradient Descent Algorithm
– Note that the algorithm need not converge at all if the learning
rate (i.e., step size) is too large
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 13
Gradient Descent Algorithm
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 14
Gradient Descent Algorithm
In MATLAB, for the perceptron with sigmoid outputs this translates into the
following update rule:
weights = weights - rate * (o - targets(i)) * dsigmoid(o) * data(i, :);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 15
learn_perceptron.m function
function [weights,mse,acc] =
learn_perceptron(data,targets,rate,threshold,init_method,random_seed,plotflag,k)
% function [weights,mse,acc] =
learn_perceptron(data,targets,rate,threshold,init_method,random_seed,plotflag,k)
%
% Learn the weights for a perceptron (linear) classifier to minimize its
% mean squared error.
% Sample code for CS 175
%
% Inputs
% data: N x (d+1) matrix of training data
% targets: N x 1 vector of target values (+1 or -1)
% rate: learning rate for the perceptron algorithm (e.g., rate = 0.001)
% threshold: if the reduction in MSE from one iteration to the next is *less*
% than threshold, then halt learning (e.g., threshold = 0.000001)
% init_method: method used to initialize the weights (1 = random, 2 = half
% way between 2 random points in each group, 3 = half way between
% the centroids in each group)
% random_seed: this is an integer used to "seed" the random number generator
% for either methods 1 or 2 for initialization (this is useful
% to be able to recreate a particular run exactly)
% plotflag: 1 means plotting is turned on, default value is 0
% k: how many iterations between plotting (e.g., k = 100)
%
% Outputs
% weights: 1 x (d+1) row vector of learned weights
% mse: mean squared error for learned weights
% acc: classification accuracy for learned weights (percentage, between 0 and 100)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 16
learn_perceptron.m function
[N, d] = size(data);
% error checking
if nargin < 4
error('The function takes at least 4 arguments (data, targets, rate, threshold)');
end
if size(data,1) ~= size(targets,1)
error('The number of rows in the first two arguments (data, targets) does not match!');
end
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 17
learn_perceptron.m function
% initialize the weights
weights = initialize_weights175(data,targets,init_method,random_seed);
iteration=0;
while iteration < 2 | ( abs(mse(iteration) - mse(iteration-1)) > threshold )
iteration = iteration + 1;
pause(0.0001);
end
end
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 18
learn_perceptron.m function
% create the plots of the MSE and Accuracy Vs. iteration number
if (plotflag == 1)
figure(2);
subplot(2, 1, 1);
plot(mse,'b-');
xlabel('iteration');
ylabel('MSE');
subplot(2, 1, 2);
plot(100-cerror,'b-');
xlabel('iteration');
ylabel('Accuracy');
end
% local functions…..
function s = sigmoid(x)
% Compute the sigmoid function, scaled from -1 to +1
s = 2./(1+exp(-x)) - 1;
function ds = dsigmoid(x)
% Compute the derivative of the (rescaled) sigmoid
ds = .5 .* (sigmoid(x)+1) .* (1-sigmoid(x));
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 19
MATLAB Demonstration
Run demo_perceptron_image_classification.m
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 20
Additional Concepts in Classification
(Relevant to Assignment 4)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 21
Assignment 4
• threshold_image.m
– Simple function to display thresholded images
• knn_dispset.m
– Finds and displays the k-nearest-neighbors for a given image
• test_classifiers.m
– Uses cross-validation to compare classifiers
– (code is provided)
• test_imageclassifiers.m
– Compare different classification methods on image data
– Uses cross-validation
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 22
Assignment 4: using kNN to find similar images
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 23
MATLAB demo of knndispset
• knndispset(i2straight,5,1,15,1);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 24
MATLAB demo of knndispset
• knndispset(i2straight,18,1,15,1);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 25
Training Data and Test Data
• Training data
– labeled data used to build a classifier
• Test data
– new data, not used in the training process, to evaluate how well a
classifier does on new data
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 26
Test Accuracy and Generalization
• Generalization
– We are really interested in how our classifier generalizes to new
data
– test data accuracy is a good estimate of generalization performance
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 27
Another Example
3
Feature 2
0
Decision
Boundary
-1
2 3 4 5 6 7 8 9 10
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 28
A More Complex Decision Boundary
4
Feature 2
0
Decision
Boundary
-1
2 3 4 5 6 7 8 9 10
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 29
Example: The Overfitting Phenomenon
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 30
A Complex Model
Y = high-order polynomial in X
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 31
The True (simpler) Model
Y = a X + b + noise
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 32
How Overfitting affects Prediction
Predictive
Error
Model Complexity
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 33
How Overfitting affects Prediction
Predictive
Error
Model Complexity
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 34
How Overfitting affects Prediction
Model Complexity
Ideal Range
for Model Complexity
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 35
Comparing Two Classifiers
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 36
Training and Validation Data
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 37
The v-fold Cross-Validation Method
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 38
Disjoint Validation Data Sets
Validation Data
Training Data
1st partition
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 39
Disjoint Validation Data Sets
Validation Data
Validation
Data
Training Data
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 40
More on Cross-Validation
• Notes
– cross-validation generates an approximate estimate of how well
the classifier will do on “unseen” data
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 41
Sample MATLAB code for Cross-Validation
% first randomly order the data (n = number of data points)
rand('state',rseed);
index = randperm(n);
data = ordereddata(index,:);
labels = orderedlabels(index);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 42
Sample MATLAB code for Cross-Validation
% now perform v-fold cross-validation
olddata = data;
oldlabels = labels;
nvalidate = floor(n/v);
for i=1:v
% set testdata and testlabels to be the first nvalidate rows of olddata,oldlabels
…..
% set traindata and trainlabels to be the rest of rows of olddata,oldlabels
…...
% call classifiers with traindata, trainlabels, testdata, testlabels
cvaccuracy(i) = classifier(…..)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 43
Assignment 4: Cross-Validation code (provided)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 44
Example of running cross-validation code
>> test_classifiers(d1,d2,1,5,1234)
Training Data Results:
Minimum distance accuracy = 87.50
KNN, k=1, accuracy = 100.00
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 45
Assignment 4: Classifying images
function [cvacc, trainacc]
= test_imageclassifiers(imageset1,imageset2,plotflag,kvalues,v,rseed)
%
% Learns a classifier to classify images in imageset1
% from images in imageset2, using minimum distance and knn classifiers,
% and returns the training and cross-validation accuracies.
%
% Your name, CS 175A
%
% INPUTS:
% imageset1, imageset2: arrays (of size m x n, and m2 x n2)
% of structures, where imageset1(i,j).image is a matrix of
% pixel (image) values of size nx by ny. It is assumed
% that all images are of the same size in both imageset1
% and imageset2.
% plotflag: if plotflag=1, plot the mean image for each set,
% and plot the difference of the means of the images in the two sets.
% kvalues: an K x 1 vector of k values for the knn classifier
% v: number of "folds" for v-fold cross-validation
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 46
The Minimum-Distance Classifier
• Calculate the mean for each class, e.g., Mean1 and Mean2
– mean vector = sum of all vectors/number of vectors
– mean vector ~ “centroid” of points
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 47
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 48
Assignment 4: Minimum Distance Classifier (provided)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 49
Summary
• Assignment 2
– Perceptron code
– Can use perceptrons (or any classifier) to classify images
• Assignment 4
– Nearest-neighbor with images
– Cross-validation
– Minimum distance classifier
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 8: Cross Validation 50