## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Jared Friedman

April 6, 2005

1.a. i. No such perceptron exists. Consider numbering the pixels one

through nine, first by row and then by column. Number each weight according to its corresponding pixel. Since 7/9 > 3/4, and 6/9 > 3/4, the

weights of any perceptron which performs this task must satisfy the following two relations: −w0 + w1 + w2 + w3 + w4 + w5 + w6 − w7 − w8 − w9 < 0,

−w0 + w1 + w2 + w3 + w4 + w5 + w6 + w7 − w8 − w9 > 0, which imply that

w7 > 0. However, they must also satisfy −w0 − w1 − w2 − w3 − w4 − w5 −

w6 + w7 + w8 + w9 < 0 −w0 − w1 − w2 − w3 − w4 − w5 − w6 − w7 + w8 + w9 > 0,

which imply that w7 < 0. And thus we have a contradiction and no such

perceptron exists.

ii. Here is such a perceptron. Let w1 = 2, w2 = 2, w3 = 2, w4 = −1, w5 =

−1, w6 = −1, w7 = −1, w8 = −1, w9 = −1, and let the threshold be 0.5.

iii. No such perceptron exists. Assume that there is such a perceptron.

I will derive a contradiction by considering the 2 x 2 upper left corner. If

such a perceptron exists, its weights must satisfy +w1 + w2 + w4 − w5 −

w0 − w3 − w6 − w7 − w8 − w9 > 0 and −w1 + w2 + w4 − w5 − w0 − w3 −

w6 − w7 − w8 − w9 < 0, which together imply that w1 > 0. similarly, we

have that −w1 − w2 − w4 + w5 − w0 − w3 − w6 − w7 − w8 − w9 > 0 and

+w1 − w2 − w4 + w5 − w0 − w3 − w6 − w7 − w8 − w9 < 0, which together

implied that w1 < 0.

b. i. 196+1 = 197

ii. I assume that the number of output nodes is 10, as with all networks for

digit recognition. Then, following the theorem in the notes, an upper bound

on the VC dimension is given by 2(197)(108) log2 (e108) = 339134.

iii. The decision boundary of a decision stump with n inputs is a hyperplane in n dimensions. However, this hyperplane is subject to the additional

**restriction that it must be perpendicular to one of the basis vectors. Thus,
**

an upper bound of its VC dimension comes from the VC dimension of perceptrons with the same number of inputs, which is n + 1, or 197.

v. consider decision trees with one continuous input. Consider the set S

consisting of the first N positive integers. For any dichotomy on S, here is an

algorithm for constructing a decision tree which represents that dichotomy.

each split will be of the form X > a − 0.5, where a is an integer. For any

given split X > a − 0.5, the left branch will be a positive classification if a

is a positive instance of the dichotomy, and a negative classification if a is

a negative instance of the dichotomy; the right branch will lead to the split

X > a + 1 − 0.5. Clearly, we can use this method to construct a decision

tree to shatter a set of size N, for any N. Therefore, the VC dimension of

decision trees must be infinite. This is not a valid reason to reject decision

trees because it shows only that the upper bound we derived for the sample complexity in terms of the VC dimension is infinite; the lower bound

we derived is still finite. Also, even if the sample complexity is not polynomial, decision trees could be a weak learner that performs well in practice.

c. Multi-layer feed-forward neural networks are clearly the most appropriate algorithm for the task. Decision trees with continuous attributes are

a poor choice because the inductive bias of decision trees is to look for hypotheses where the classification depends primarily on the values of only

a few attributes. Except possibly for the pixels on the border, all the inputs to the decision tree are roughly equally important in determining the

classification. Similarly, the domain of digit recognition shows translation

independence – if all the pixel values are shifted over by one, the classification should not change – and this will require a very large decision tree to

represent. On the other hand, the fact that the VC dimension of decision

trees with continuous attributes is infinite suggests that this will be a highly

expressive hypothesis space. Decision trees may well be able to represent

a correct hypothesis, however they will have to be very large to do so, and

will likely be untrainable.

Boosted decision stumps are a better choice. For one, as we can see in

part b, the VC dimension of boosted decision stumps is very similar to the

VC dimension of neural networks with one hidden layer. However, the inductive bias is still wrong. There is no reason to think that any particular pixel

will have particularly good discriminatory value in this domain, as would be

required for decision stumps to be a suitable choice. Also, classification into

2

**digits will likely depend upon a fairly complex interaction between many
**

pixels. Boosted decision stumps, being a kind of a weighted sum, will have

trouble modeling this interaction.

Perceptrons are a reasonable choice with respect to their inductive bias,

and will probably perform reasonably well, especially considering the short

amount of time to train them. However, they are simply not expressive

enough to achieve top performance. As we can see from part b, the VC

dimension of perceptrons is much less than the VC dimension of a neural

net with a sizable hidden layer. Multi-layer feed-forward neural networks

combine the inductive bias of perceptrons with the high dimensionality of

the other two algorithms, and are therefore the best choice.

3.a. The error function F is addressing the problem that the weights in

a neural network generally increase over a large amount of training, and can

become unreasonably large. This can lead to overfitting. To correct for this,

the error function F penalizes large weights by calculating a weighted sum

of the training set error and the square of the Euclidean distance between

the origin and the weight vector.

∂F

b. To derive a weight update rule, all we need to do is to calculate ∂w

.

ji

Since F is a sum of two terms, we can take the partial derivatives of those

P

∂F

terms separately, and we find that ∂w

= − d∈D adj δid + 2λwji . Therefore,

ji

the update rule is wji ← wji + α(

P

d∈D

3

adj δid − 2λwji ).

- Early Dropout Predictionuploaded byRafaello Blau JimPalos
- Bhaskar DasGupta and Barbara Hammer- On Approximate Learning by Multi-layered Feedforward Circuitsuploaded byGrettsz
- MTech_CSIT-2-3-4sem[1]uploaded byAvneesh Kumar
- CH6 AI Lectureuploaded byAnonymous RFprGAJR
- EX1uploaded byJyotirmay Senapati
- Intrusion Detection System for Classification of Attacks with Cross Validationuploaded byinventionjournals
- A Study of Some Data Mining Classification Techniquesuploaded byIRJET Journal
- svm_introuploaded byBinit Thapa
- unit1uploaded byGowthamUcek
- Artigo00uploaded byPranayB16
- wp_10961uploaded byhcoplanet
- D1,L1 Algorithms.pptuploaded bymokhtarppg
- Decision Tree Applet Design Manualuploaded byAlexandra-Petronela
- Theory of Complexity Classes - Yap (1998)uploaded bydaveparag
- 15 Introduction of Softcomputing Approach in Slope Stabilityuploaded bysnehal
- 10.1.1.124uploaded byVishnu Samy
- Assignement 1 Machine Learninguploaded byadnan
- Application of Neural Networks to Heuristic Scheduling Algorithmsuploaded bySonia Yusuf Kahfi
- smouploaded bybhatt_chintan7
- impuritati.pdfuploaded byPixie Taly
- Sqlserver Dmuploaded byThinh Tran Van
- counting.pdfuploaded byIqra Ayesha
- 3. the World and Mind of Computation and Complexityuploaded byDaniel Giaj-Levra Lavieri
- Description Logicsuploaded byRe Ho
- Summary of Decision Treeuploaded byzanmatto22
- Algorithm basicsuploaded byGeetika Chauhan
- BACK-PROPAGATION TRAINING OF A NEURAL NETWORKuploaded byDao Tien Dzung
- TimeComplexityuploaded byGëzim Musliaj
- 10.1.1.399.7777uploaded byLam Sin Wing
- lecture2(9)uploaded bysatyabasha

- 39589412 VPR Vermont Poll Key Issuesuploaded byJared Friedman
- Murder at the Speakeasyuploaded byJared Friedman
- TechnologyPioneers2010 (1)uploaded byJared Friedman
- Test ePUB comicuploaded byJared Friedman
- Elon Musk interview transcript - How to Build the Futureuploaded byJared Friedman
- World Economic Forum Technology Pioneers 2010uploaded byJared Friedman
- A personal account of recovering from RSI through the approach of John Sarnouploaded byJared Friedman
- Haruko HTML Jpeg 20120524uploaded byJared Friedman
- Court Order for Dismissal of Scott v. Scribduploaded byJared Friedman
- Jared Local Testuploaded byJared Friedman
- Max Hawkinsuploaded byJared Friedman
- Alice's Adventures in Wonderland by Lewis Carrolluploaded byJared Friedman
- Rich text editor testuploaded byJared Friedman
- adsfAlzheimer's Disease at Home (1)uploaded byJared Friedman
- Resume 9uploaded byJared Friedman
- Test 4uploaded byJared Friedman
- Test 6uploaded byJared Friedman
- 1 Improved Statistical Testuploaded byJared Friedman
- Very old Scribd browse page designuploaded byJared Friedman
- Stipulation of Dismissal for Scott v. Scribd caseuploaded byJared Friedman
- Improved Statistical Testuploaded byJared Friedman
- Stambecco Preso Base 2010-04-12uploaded byJared Friedman
- Test 5uploaded byJared Friedman
- Mario Competition Resultsuploaded byJared Friedman
- 1 Improved Statistical Testuploaded byJared Friedman
- test2uploaded byJared Friedman
- Test 5uploaded byJared Friedman
- Test 6uploaded byJared Friedman
- csc-sampis2uploaded byJared Friedman
- Vintage Scribd.com homepageuploaded byJared Friedman

Read Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading