Attribution Non-Commercial (BY-NC)

63 views

Attribution Non-Commercial (BY-NC)

- Cheng: Interactive Ranking of Skylines Using Machine Learning Techniques
- Neural Networks
- Keras_Cheat_Sheet_Python.pdf
- Paper Giampiero
- Artificial Neural Networks
- Book
- Lecture 1
- Artificial Neuron Network. (1)
- Grid Cv Results Mnist 3000points
- NNFL
- 10.1007@s00521-015-2103-9
- Gregory Smith Dissertation
- CRL-2001-1
- final review.docx
- 384029_634085082461037853
- Neural Network Models
- klasifikasi
- The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
- Sapiens: A Brief History of Humankind
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race

You are on page 1of 31

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 1)

Learning

Learning is essential for unknown environments.

Machine Learning is concerned with how to construct computer programs

that can automatically improve with experience.

Learning Process:

• Choosing the target function.

• Choosing a target approximation/optimization function.

• Testing the induced function (performance).

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 2)

Learning

Simplest form: learn a function from examples

f is the target function

An example is a pair x, f (x).

Problem: find a(n) hypothesis h

such that h ≈ f

given a training set of examples

(This is a highly simplified model of real learning:

– Ignores prior knowledge

– Assumes a deterministic, observable “environment”

– Assumes examples are given

– Assumes that the agent wants to learn f —why?)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 3)

Learning method

Construct/adjust h to agree with f on training set

(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

f(x)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 4)

Learning method

Construct/adjust h to agree with f on training set

(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

f(x)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 5)

Learning method

Construct/adjust h to agree with f on training set

(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

f(x)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 6)

Learning method

Construct/adjust h to agree with f on training set

(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

f(x)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 7)

Learning method

Construct/adjust h to agree with f on training set

(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

f(x)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 8)

Learning method

Construct/adjust h to agree with f on training set

(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

f(x)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig ( 9)

Attribute-based representations

Examples described by attribute values (Boolean, discrete, continuous, etc.)

E.g., situations where I will/won’t wait for a table:

Alt Bar F ri Hun P at P rice Rain Res T ype Est WillWait

X1 T F F T Some $$$ F T French 0–10 T

X2 T F F T Full $ F F Thai 30–60 F

X3 F T F F Some $ F F Burger 0–10 T

X4 T F T T Full $ F F Thai 10–30 T

X5 T F T F Full $$$ F T French >60 F

X6 F T F T Some $$ T T Italian 0–10 T

X7 F T F F None $ T F Burger 0–10 F

X8 F F F T Some $$ T T Thai 0–10 T

X9 F T T F Full $ T F Burger >60 F

X10 T T T T Full $$$ F T Italian 10–30 F

X11 F F F F None $ F F Thai 0–10 F

X12 T T T T Full $ F F Burger 30–60 T

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 10)

Neural networks

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 11)

Neural networks

♦ Brains

♦ Neural networks

♦ Perceptrons

♦ Multilayer perceptrons

♦ Applications of neural networks

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 12)

Brains

1011 neurons of > 20 types, 1014 synapses, 1ms–10ms cycle time

Signals are noisy “spike trains” of electrical potential

Axonal arborization

Synapse

Dendrite Axon

Nucleus

Synapses

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 13)

McCulloch–Pitts “unit”

Output is a “squashed” linear function of the inputs:

ai ← g(ini) = g Σj Wj,iaj

Bias Weight

a0 = −1 ai = g(ini)

W0,i

g

ini

Σ

Wj,i

aj ai

Links Function Function Output Links

to develop understanding of what networks of simple units can do

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 14)

Activation functions

g(ini) g(ini)

+1 +1

ini ini

(a) (b)

(b) is a sigmoid function 1/(1 + e−x)

Changing the bias weight W0,i moves the threshold location

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 15)

Implementing logical functions

W0 = 1.5 W0 = 0.5 W0 = – 0.5

W1 = 1 W1 = 1

W1 = –1

W2 = 1 W2 = 1

AND OR NOT

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 16)

Network structures

Feed-forward networks:

– single-layer perceptrons

– multi-layer perceptrons

Feed-forward networks implement functions, have no internal state

Recurrent networks:

– Hopfield networks have symmetric weights (Wi,j = Wj,i)

g(x) = sign(x), ai = ± 1; holographic associative memory

– Boltzmann machines use stochastic activation functions,

≈ MCMC in Bayes nets

– recurrent neural nets have directed cycles with delays

⇒ have internal state (like flip-flops), can oscillate etc.

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 17)

Feed-forward example

W1,3

1 3

W3,5

W1,4

5

W2,3 W4,5

2 4

W2,4

a5 = g(W3,5 · a3 + W4,5 · a4)

= g(W3,5 · g(W1,3 · a1 + W2,3 · a2) + W4,5 · g(W1,4 · a1 + W2,4 · a2))

Adjusting weights changes the function: do learning this way!

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 18)

Single-layer perceptrons

Perceptron output

1

0.8

0.6

0.4

0.2 4

0 2

-4 -2 0 x2

-2

Input Output x1

0 2 -4

Wj,i 4

Units Units

Adjusting weights moves the location, orientation, and steepness of cliff

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 19)

Expressiveness of perceptrons

Consider a perceptron with g = step function (Rosenblatt, 1957, 1960)

??

Can represent AND, OR, NOT, majority, etc., but not XOR

Represents a linear separator in input space:

?? ???

Σj Wj xj > 0 or W · x > 0

x1 x1 x1

?????

1 1 1

0 0 0

0 1 x2 0 1 x2 0 1 x2

(a) x1 and x2 (b) x1 or x2 (c) x1 xor x2

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 20)

Perceptron learning

Learn by adjusting weights to reduce error on training set

The squared error for an example with input x and true output y is

1 2 1

E = Err ≡ (y − hW(x))2 ,

2 2

Perform optimization search by gradient descent:

∂E ∂Err ∂ n

y − g(Σj = 0Wj xj )

= Err × = Err ×

∂Wj ∂Wj ∂Wj

= −Err × g ′(in) × xj

Simple weight update rule:

Wj ← Wj + α × Err × g ′(in) × xj

E.g., +ve error ⇒ increase network output

⇒ increase weights on +ve inputs, decrease on -ve inputs

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 21)

Perceptron learning contd.

Perceptron learning rule converges to a consistent function

for any linearly separable data set

Proportion correct on test set

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 Perceptron 0.6

Decision tree

0.5 0.5 Perceptron

Decision tree

0.4 0.4

0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Training set size - MAJORITY on 11 inputs Training set size - RESTAURANT data

DTL learns restaurant function easily, perceptron cannot represent it

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 22)

Multilayer perceptrons

Layers are usually fully connected;

numbers of hidden units typically chosen by hand

Output units ai

Wj,i

Hidden units aj

Wk,j

Input units ak

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 23)

Expressiveness of MLPs

All continuous functions w/ 2 layers, all functions w/ 3 layers

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 4 0.2 4

0 2 0 2

-4 -2 0 x2 -4 -2 0 x2

0 -2 0 -2

x1 2 -4 x1 2 -4

4 4

Combine two perpendicular ridges to make a bump

Add bumps of various sizes and locations to fit any surface

Proof requires exponentially many hidden units (cf DTL proof)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 24)

Back-propagation learning

Output layer: same as for single-layer perceptron,

Wj,i ← Wj,i + α × aj × ∆i

where ∆i = Err i × g ′(in i)

Hidden layer: back-propagate the error from the output layer:

∆j = g ′(in j ) Wj,i∆i .

X

Wk,j ← Wk,j + α × ak × ∆j .

(Most neuroscientists deny that back-propagation occurs in the brain)

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 25)

Back-propagation derivation

The squared error on a single example is defined as

1X

E= (yi − ai)2 ,

2 i

where the sum is over the nodes in the output layer.

∂E ∂ai ∂g(in i)

= −(yi − ai) = −(yi − ai)

∂Wj,i ∂Wj,i ∂Wj,i

∂in ∂

i

= −(yi − ai)g ′(in i) = −(yi − ai)g ′(in i) Wj,iaj

X

∂Wj,i ∂Wj,i j

= −(yi − ai)g ′(in i)aj = −aj ∆i

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 26)

Back-propagation derivation contd.

∂E ∂ai ∂g(in i)

= − (yi − ai) = − (yi − ai)

X X

∂in ∂

i

= − (yi − ai)g ′(in i) = − ∆i Wj,iaj

X X X

i ∂Wk,j i ∂Wk,j j

∂aj ∂g(in j )

= − ∆iWj,i = − ∆iWj,i

X X

i ∂Wk,j i ∂Wk,j

∂in j

= − ∆iWj,ig ′(in j )

X

i ∂Wk,j

∂

X X

i ∂Wk,j k

= − ∆iWj,ig ′(in j )ak = −ak ∆j

X

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 27)

Back-propagation learning contd.

At each epoch, sum gradient updates for all examples and apply

Training curve for 100 restaurant examples: finds exact fit

14

Total error on training set

12

10

8

6

4

2

0

0 50 100 150 200 250 300 350 400

Number of epochs

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 28)

Back-propagation learning contd.

Learning curve for MLP with 4 hidden units:

1

Proportion correct on test set

0.9

0.8

0.7

Multilayer network

0.5

0.4

0 10 20 30 40 50 60 70 80 90 100

Training set size - RESTAURANT data

but resulting hypotheses cannot be understood easily

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 29)

Handwritten digit recognition

400–300–10 unit MLP = 1.6% error

LeNet: 768–192–30–10 unit MLP = 0.9% error

Current best (kernel machines, vision algorithms) ≈ 0.6% error

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 30)

Summary

Most brains have lots of neurons; each neuron ≈ linear–threshold unit (?)

Perceptrons (one-layer networks) insufficiently expressive

Multi-layer networks are sufficiently expressive; can be trained by gradient

descent, i.e., error back-propagation

Many applications: speech, driving, handwriting, fraud detection, etc.

Engineering, cognitive modelling, and neural system modelling

subfields have largely diverged

These lecture notes are an updated versions of the lecture slides prepared by Stuart Russell and Peter Norvig (Chapter 20, Section 5 31)

- Cheng: Interactive Ranking of Skylines Using Machine Learning TechniquesUploaded byWeiwei Cheng
- Neural NetworksUploaded bymundosss
- Keras_Cheat_Sheet_Python.pdfUploaded byokhtaya
- Paper GiampieroUploaded byadekusut
- Artificial Neural NetworksUploaded byBhavik Prajapati
- BookUploaded bydwi purnomo
- Lecture 1Uploaded bygovindcool
- Artificial Neuron Network. (1)Uploaded byAG20390
- Grid Cv Results Mnist 3000pointsUploaded byYash Garg
- NNFLUploaded byVikram Gupta
- 10.1007@s00521-015-2103-9Uploaded byNaila Ashraf
- Gregory Smith DissertationUploaded byJoe Frando
- CRL-2001-1Uploaded byDongpyo Hong
- final review.docxUploaded byAniKelbakiani
- 384029_634085082461037853Uploaded byPurna Reddy
- Neural Network ModelsUploaded byCamelia Ignat
- klasifikasiUploaded bySintaKusumaW

- Forex Robot TFOT 8 is Available nowUploaded byVictor
- Percept RomUploaded byAnil
- (Studies in Fuzziness and Soft Computing) Patricia Melin, Oscar Castillo-Hybrid Intelligent Systems for Pattern Recognition Using Soft Computing_ an Evolutionary Approach for Neural Networks and FuzzyUploaded bylig
- annUploaded byChristopher Herring
- Machine Learning Projects .pdfUploaded byEnglish Songs
- Artificial IntelligenceUploaded byAnisam Abhi
- Neuro-fuzzy.pptUploaded byDavid Cesar
- Machine Learning Algorithms for CharacterizationUploaded byBlack Tear Misery
- Kecerdasan Komputasional - Materi 01Uploaded byurehsam
- Violence Detection CrowdUploaded byFaraz Ahmed Bayunus
- Phone Recognition using Deep Belief NetworksUploaded bycabirul
- AdaBoost.M1Uploaded byDmitriy Boiko
- Perceptron Linear ClassifiersUploaded byHimanshu Saxena
- Machine Learning for Algo TradingUploaded byYash Akhauri
- Introduction to Machine LearningUploaded byMustafamna Al Salam
- Using Neural Networks for Image ClassificationUploaded bywixu
- Recurrent Neural Networks Tutorial, Part 3Uploaded byhoja
- Flann -Uploaded byNadeem Uddin
- Gabrovo_2008Uploaded bygoremad
- Data Science Learning PlanUploaded bysuprabhatt
- Recognition of Handwritten Digits Using Rbf Neural NetworkUploaded byInternational Journal of Research in Engineering and Technology
- Artificial IntelligenceUploaded byksai.mb
- Omar AlZoubi, Irena Koprinska and Rafael A. Calvo- Classification of Brain-Computer Interface DataUploaded byAsvcxv
- tutorial_NN_toolboxUploaded bysaurabhguptaengg1540
- Statistical Pattern Recog ReviewUploaded byjeromeku
- Gonzalez Woods Image Processing PDFUploaded byCedric
- Character RecognitionUploaded byabdul malik
- Xception NetUploaded byVihang Agarwal
- Implementation of Artificial Intellegence to Diagnose and Predicting Breast Cancer DiseaseUploaded byHimawan
- PisatiUploaded byAndré Santana