Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 39 |Likes:
Published by api-3814100

More info:

Published by: api-3814100 on Oct 17, 2008
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





CS229 Lecture notes
Andrew Ng
Lecture on 11/4/04
1 The perceptron and large margin classi\ufb01ers
In this \ufb01nal set of notes on learning theory, we will introduce a di\ufb00erent
model of machine learning. Speci\ufb01cally, we have so far been considering
batch learningsettings in which we are \ufb01rst given a training set to learn

with, and our hypothesish is then evaluated on separate test data. In this set of notes, we will consider theonline learning setting in which the algorithm has to make predictions continuously even while it\u2019s learning.

In this setting, the learning algorithm is given a sequence of examples (x(1) ,y(1)), (x(2) ,y(2)),... (x(m) ,y(m)) in order. Speci\ufb01cally, the algorithm \ufb01rst seesx(1) and is asked to predict what it thinksy(1) is. After making its pre- diction, the true value ofy(1) is revealed to the algorithm (and the algorithm may use this information to perform some learning). The algorithm is then shownx(2) and again asked to make a prediction, after whichy(2) is revealed, and it may again perform some more learning. This proceeds until we reach (x(m) ,y(m)). In the online learning setting, we are interested in the total number of errors made by the algorithm during this process. Thus, it models applications in which the algorithm has to make predictions even while it\u2019s still learning.

We will give a bound on the online learning error of the perceptron algo- rithm. To make our subsequent derivations easier, we will use the notational convention of denoting the class labels byy =\u2208{\u22121, 1}.

Recall that the perceptron algorithm has parameters\u03b8\u2208Rn+1, and makes
its predictions according to
h\u03b8(x) = g(\u03b8T x)
CS229 Winter 2003
g(z) =
ifz\u2265 0
\u22121 ifz < 0.

Also, given a training example (x, y), the perceptron learning rule updates the parameters as follows. Ifh\u03b8(x) =y, then it makes no change to the parameters. Otherwise, it performs the update1

\u03b8:= \u03b8+ yx.

The following theorem gives a bound on the online learning error of the perceptron algorithm, when it is run as an online algorithm that performs an update each time it gets an example wrong. Note that the bound below on the number of errors does not have an explicit dependence on the number of examplesm in the sequence, or on the dimensionn of the inputs (!).

Theorem (Block, 1962, and Noviko\ufb00, 1962). Let a sequence of exam-

ples (x(1) ,y(1)), (x(2) ,y(2)),... (x(m) ,y(m)) be given. Suppose that||x(i)|| \u2264D for alli, and further that there exists a unit-length vectoru (||u||2 = 1) such thaty(i)\u00b7 (uT x(i))\u2265\u03b3 for all examples in the sequence (i.e.,uT x(i)\u2265 \u03b3 if

y(i)= 1, and uT x(i)\u2264 \u2212 \u03b3if y(i)=\u22121, so that useparates the data with a
margin of at least\u03b3). Then the total number of mistakes that the perceptron
algorithm makes on this sequence is at most (D/\u03b3)2.
Proof.The perceptron updates its weights only on those examples on which

it makes a mistake. Let\u03b8(k) be the weights that were being used when it made itsk-th mistake. So,\u03b8(1) =\ue0000 (since the weights are initialized to zero), and if thek-th mistake was on the example (x(i) ,y(i)), theng((x(i))T\u03b8(k))\ue000=y(i), which implies that

(x(i))T\u03b8(k)y(i)\u2264 0.
Also, from the perceptron learning rule, we would have that\u03b8(k+1) =\u03b8(k) +
We then have
(\u03b8(k+1))Tu = (\u03b8(k))Tu+y(i)(x(i))Tu
\u2265(\u03b8(k))Tu +\u03b3
1This looks slightly di\ufb00erent from the update rule we had written down earlier in the
quarter because here we have changed the labels to bey\u2208 {\u22121, 1}. Also, the learning rate
parameter\u03b1 was dropped. The only e\ufb00ect of the learning rate is to scale all the parameters
\u03b8by some \ufb01xed constant, which does not a\ufb00ect the behavior of the perceptron.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->