This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Chapter 20 Only relevant parts

Concerns • • • • • Generalization Accuracy Efficiency Noise Irrelevant features Generality: when does this work? .

-1}. • Classification rule: • Briefly: W*F>0 where * is inner product of weight vector and feature weights and F has been augmented with extra 1.. .w1.Linear Model • • • • Let f1.+wn*fn> 0. predict class + else predict class -.wn. – -w0 is the threshold – If w0*f0+w1*f1. …fn be the feature values of an example. Let class be denoted {+1. Define f0 = -1... (bias weight) Linear model defines weights w0.

f1. 2* f1 + 3*f2 > 4 is classifier Equivalently: <4.2. • Mapping data into higher dimensions is key idea behind SVMs .f2> to <-1.f1.f2> > 0 Mapping data <f1.Augmentation Trick • • • • Suppose data defined features f1 and f2.3> *<-1.f2> allows learning/representing threshold as just another featuer.

the embedding makes data linearly separable.. • For any labelling of xi by classes +/-.> where 1 in n+i position.. – Define wi = 0 i<N – w(i+n) = 1 if xi is + else 0. • Map xi into R^{N+M} by xi -> <xi. .Mapping to enable Linear Separation • Let xi be m vectors in R^N.0.0. – W(i+n) = -1 if xi is negative else 0..1.

threshold = 0 • “And” of n features – Wi = 1 threshold = n -1 • K of n features (prototype) – Wi =1 threshold = k -1 • Can’t do XOR • Combining linear threshold units yields any boolean function. .Representational Power • “Or” of n features – Wi = 1.

Classical Perceptron • • • • Goal: Any W which separates the data. Algorithm (X is augmented with 1) W=0 Repeat – If X positive and W*X wrong. • Until no errors or very large number of times. – Else if X negative & W*X wrong. W = W+X. . W = W-X.

then number of mistake is < R^2/m^2. • Training time can be exponential in number of features. but guaranteed to work.Classical Perceptron • Theorem: If concept linearly separable. then algorithm finds a solution. • Convergence can take exponentially many epochs. • Epoch is single pass through entire data. • If |xi|<R and margin = m. .

no guarantee of separation . • While derivates tell you the direction (the negative gradient) they do not tell you how much to change each Xi.Hill-Climbing Search • This is an optimization problem. • On the plus side it is fast. • On the negative side. • The solution is by hill-climbing so there is no guarantee of finding the optimal solution.

• change in Xi is -2*Err*Xj .e. • Let class yi be 1 or -1. • Use Calculus.Hill-climbing View • Goal: minimize Squared-error = Err^2. i. • Let Err = sum(W*Xi –Yi) where Xi is ith example. take partial derivates wrt Wj. • This is a function only of the weights. move in direction of negative gradient. • To move to lower value.

• Assuming the line separates the data. the margin is the minimum of the closest positive and negative example to the line.Support Vector Machine • Goal: maximize the margin. • Good News: This can be solved by quadratic program. SVM will add more features. • Implemented in Weka as SOM. . • If not linearly separable.

4. 2. 2. Can Represent any boolean function: why? No guarantees about learning Slow Incomprehensible Can represent any boolean function Learning guarantees Fast Semi-comprehensible 2. 1. Add more features: SVM . 3. 3.If not Linearly Separable 1. 4. Add more nodes: Neural Nets 1.

but SVM will almost do it for you.y) -> (x. .y. easily separable. (set parameters). • Clearly very unlinearly separable • Map (x.Adding features • Suppose pt (x.y) is positive if it lies in the unit disk else negative. • This works for any learning algorithm. x^2+y^2) • Now in 3-space.

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd