P. 1
Neural Net 3rdClass

Neural Net 3rdClass

|Views: 1|Likes:
Published by Uttam Satapathy
Neural Net 3rdClass
Neural Net 3rdClass

More info:

Published by: Uttam Satapathy on Feb 07, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

02/07/2013

pdf

text

original

A Brief Overview of Neural Networks N k

Overview
• • • • Relation to Biological Brain: Biological Neural Network g g The Artificial Neuron Types of Networks and Learning Techniques Supervised Learning & Backpropagation Training Algorithm • Learning by Example • Applications

Biological Neuron

Artificial Neuron W I N P U T S W Neuron Σ W W W=Weight W i ht f(n) Activation Function Outputs .

Transfer Functions Output 1 1 SIGMOID : f (n) = −n 1 +e Input 0 LINEAR : f (n) = n .

Types of networks Multiple Inputs and Single Layer Multiple Inputs and layers .

Feedback Recurrent Networks .Types of Networks – Contd Contd.

Recurrent Networks • Feed forward networks: – Information only flows one way – One input pattern produces one output – No sense of time (or memory of previous state) Recurrency – Nodes connect back to other nodes or themselves – Information flow is multidirectional – Sense of time and memory of previous state(s) • Biological nervous systems show high levels of recurrency (but feed-forward structures exists too) .

Neurones (nodes) 2. Synapses (weights) .ANNs – The basics • ANNs incorporate the two fundamental components of biological neural nets: 1.

Feed forward Feed-forward nets • Information flow is unidirectional • Data is presented to Input layer p p y • Passed on to Hidden Layer • Passed on to Output layer • Information is distributed • Information processing is parallel Internal representation (interpretation) of data .

. • The inputs are well understood. but not necessarily how to combine them. them • The output is well understood. You have plenty of examples where both the inputs and the output are known. • Experience is available. You know what you are trying to predict predict. You have a p good idea of which features of the data are important.Neural networks a e good for p ed ct o p ob e s eu a et o s are o prediction problems. This experience will be used to train the network.

5)) = 0.5 × (-1.5 1 + e 0 .25 + (-0.• Feeding data through the net: (1 × 0.25) + (0.0.3775 = .75) 1 Squashing: = 0.5 .

Learning Techniques • Supervised Learning: Inputs from the environment Expected Output Actual System y Actual Output + Neural Network Training Error Σ .

Multilayer Perceptron Inputs First Hidden layer Second Hidden Layer Output Layer .

Signal Flow Backpropagation of Errors B k i fE Function Signals Error Signals .

2 3.Neural networks for Directed Data Mining: Building a model f classification and prediction d l for l ifi ti d di ti 1. adjusting the training set network topology nad set. p Test the network on a test set strictly independent from the training examples. Evaluate the network using the evaluation set to see how well it performs. parameters. 2. 5. Set up a network on a representative set of training examples. 6 . Identify the input and output features Normalize (scaling) the inputs and outputs so their range is between 0 and 1. 4. topology. 6. l Train the network on a representative set of training examples. If necessary repeat the training. Apply the model generated by the network to predict outcomes for unknown inputs.

Learning by Example • Hidden layer transfer function: Sigmoid function = F(n)= 1/(1+exp(-n)). • Output layer transfer function: Linear function= F(n)=n. Output=Input to the neuron F( ) O I h Derivative= F’(n)= 1 . where n is the net input to the neuron. Derivative= F’(n) = (output of the neuron)(1output of the neuron) : Slope of the transfer function.

Purpose of the Activation Function • We want the unit to be “active” (near +1) when the “right” ( ) g inputs are given • We want the unit to be “inactive” (near 0) when the “wrong” inputs are given. Otherwise. “ ”i t i • It’s preferable for activation function to be nonlinear. . the entire neural network collapses into a simple linear function.

1 and weight W0. threshold = 0) (in p . . if x ≤ threshold -1. if x ≤ 0 ( picture above. ) Adding an extra input with activation a0 = .Possibilities for activation function Step function Sign function Sigmoid (logistic) function sigmoid(x) = 1/(1+e-x) sign(x) = +1. This way we can always assume a 0 threshold. if x > threshold 0. if x > 0 step(x) = 1.jj = t (called the bias weight) is equivalent to having a threshold at t.

Using Bias Weight to U i a Bi W i ht t Standardize the Threshold -1 x1 x2 W1x1+ W2x2 < T W1x1+ W2x2 .T < 0 W1 W2 T .

Learning by Example • Training Algorithm: backpropagation of errors using gradient descent training. • Colors: – Red: Current weights –O Orange: Updated weights U d t d i ht – Black boxes: Inputs and outputs to a neuron – Blue: Sensitivities at each layer .

e..how much the error would change if we made a small change in each weight). i. network A set of weights defines a point on this surface surface. Thus the network as a whole is moving in the direction of steepest descent on the error surface. Then the weights are being altered in an amount proportional t th slope i each di ti l to the l in h direction (corresponding ti ( di to a weight). – Error surface: The surface that describes the error on each example as a function of all the weights in the network. (It could also be called a state in the state space of possible weights.e. weight space.) – We look at the partial derivative of the surface with respect to each weight (i.• The perceptron learning rule performs gradient descent in weight space. . the gradient -..

Definition of Error: Sum of Squared Errors 1 1 2 E= (t − o) = Err 2 ∑ 2 examples 2 This is introduced to simplify the math on the next slide Here. t t f th l t . t is the correct (desired) output and o is the actual output of the neural net.

Reduction of Squared Error Gradient descent reduces the squared error by calculating the partial derivative of E with respect to each weight: ∂E ∂Err chain rule for derivatives h i l f d i ti =E × Err a vector ∇E = ∂W j ∂W j This is called “in” ∇ E is ⎛ ⎛ n ⎞⎞ ⎜ t − g ⎜ ∑ Wk xk ⎟ ⎟ expand second E above to (t – g(in)) d d Err b ( (i )) ⎜ ⎟ ⎝ k =0 ⎠⎠ ⎝ ∂t = − Err × g ' (in) × x j = 0 and chain rule because ∂ = Err × ∂W j learning rate ∂W j W j ← W j + η × Err × g ' (in) × x j The weight is updated by η times this gradient of error ∇ E in weight space. is typically set to a small value such as 0. η.1 . The fact that the weight is updated in the correct direction (+/-) can be verified with examples. The learning rate.

6508)(0.5 0.6508 G3=(1)(0.5 05 0.5)(2)=0.5 0.6508)(10.5)=0.6225)(0.6225)(10.0397 0 6508)(0 3492)(0 5) 0 0397 0.6225 0.5 0.5 0.3492)(0.6225 0.6508 1 0.6508=0.0397)(0.3492 .5 0.3492)=0.6225 0.6508 Gradient of the neuron= G =slope of the transfer function×[Σ{(weight of the neuron t the next neuron) × to th t ) (output of the neuron)}] Gradient of the output neuron = slope of the transfer function × error Error=1-0.First Pass G1= (0.0093 0 6225)(0 0397)(0 5)(2) 0 0093 G2= (0.3492 0.6508 0.5 0.5 05 0.6225 0.5 0.5 0.

6136 .5)(0.5+(0.5124 0.5047 0.6225) 0.5124 0 5124 0.Weight Update 1 New Weight=Old Weight + {(learning rate)(gradient)(prior output)} 0.0397)(0.6508) 0.5)(0.3492)(0.5124 0.5+(0.6136 0.5)(0.5124 0.0093)(1) 0.5+(0.5047 0.

8033=0.5047 0.1967)(0.5124 0.5124 0.6391 0.6136 0.6545)(0.5124 0 5124 0.0273)(2)=0.8033 1 0.5047 0 5047 0.6545)(10.8033 G3=(1)(0.5124 0.1967)=0.6236 0.Second Pass G1= (0.6236)(10.0273 0 6545)(0 1967)(0 6136) 0 0273 0.5047 0.6545 0.6545 Error=1-0.6236 0.6236)(0.6391 0.5047 0.0066 0 6236)(0 5124)(0 0273)(2) 0 0066 G2= (0.1967 .1967 0.5124)(0.6136)=0.6136 0.

6779 0.0273)(0.Weight Update 2 New Weight=Old Weight + {(learning rate)(gradient)(prior output)} 0.5)(0.5047+(0.1967)(0.508 0.5124+(0.5209 0.5209 0.6236) 0.5209 0.5)(0.6136+(0.6779 .508 0.0066)(1) 0.6545) 0.5)(0.5209 0 5209 0.

6243 0.5209 0 5209 0.6571 0.508 0 508 0.8909 1 0.508 0.6243 0.5209 0.6504 0.508 0.5209 0.6571 0.6779 0.8909 0.6504 .6779 0.508 0.5209 0.Third Pass 0.

6508 1 0.508 0.6779 0.1091 W1: Weights from the input to the input layer W2: Weights from the input layer to the hidden layer W3: Weights from the hidden layer to the output layer .5124 0.5 0.1967 Pass 2 Update 0.Weight Update Summary Weights Output Expected O Error w1 1 w2 2 w3 3 Initial conditions 0.8909 1 0.8033 0 5047 0 5124 0 6136 0 8033 1 0 1967 0.3492 Pass 1 Update 0.5 0.5209 0.6136 0.5 0.5047 0.

• Typical mse: 1e 5 1e-5 • Other complicated backpropagation training l ith t i i algorithms also available. l il bl .Training Algorithm • The process of feedforward and backpropagation continues until the required mean squared error has been reached.

Why Gradient? O1 Output W1 O3 = 1/[1+exp(-N)] Error = Actual Output – O3 1 N= O2 (O1×W1) +(O2×W (O O = Output of the neuron 2) W = Weight N = Net input to the neuron W2 • To reduce error: Change in weights: 0 Input o Learning rate o Rate of change of error w.r.t rate of change of ‘N’ Prior output (O1 and O2) .t rate of change of weight Gradient: rate of change of error w.r.

r.t rate of change in net input to neuron o For output neurons Slope of the transfer function × error o For hidden neurons : A bit complicated ! : error fed back in terms of gradient of successive neurons Slope of the transfer function × [Σ (gradient of next neuron × weight connecting the neuron to the next neuron)] Why summation? Share the responsibility!! .Gradient in Detail • Gradient : Rate of change of error w.

66×(1-0.66)×(0.731 0.5 0.66)×(-0.4 0.66 G1=0.66)= -0.66 = -0.66 1 Error = 1-0.5 0.66×(1-0.0763 1 0.6645 0.148 Increase less .5 0.6645 0.66 0 Error = 0-0.An Example G1=0.598 Reduce more 0.5 0.66 = 0 3 o 0 66 0.34)= 0.34 0.

Improving performance • Changing the number of layers and number of neurons in each layer. • Variation in Transfer functions functions. . • Changing the learning rate. • Type of p p yp pre-processing and p g postprocessing. • Training for longer times.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->