149 views

Uploaded by Asim Arunava Sahoo

save

- Content
- ReactivePower PSO Wks
- Taguchi Integrated LEAST SQUARES Svm
- 36-200-2-PB
- Samuel 2017
- A Neural Network Model for Predicting Time Series With Interventions and a Comparative Analysis
- a
- PerceptConvProof
- Lecture Neural Nets 2010
- Artificial Neural Networks for RF and Microwave Design From Theory to P.pdf
- NR-420206 -Neural Networks and Applications
- 10.1.1.61
- Neural Networks and Statistical Models
- 29 Sep
- neu
- Neural Network
- k.facial Recog
- WCE2007_pp648-651
- weissg-synthesis97.pdf
- 12
- 10.1.1.50.1441
- Neurofuzzy system for short term Electric Load forecasting
- 2012-1161. Neuro-Fuzzy Systems.ppt
- Artificial neural network approach to network reconfiguration for loss minimization in distributi
- Cs101 Assignment 04 Spring 2012
- Neural Network Implemented in the CMOS Technology
- Maxflow FF
- Dynagraph Card Analysis
- neura
- Crop Recommendation System to Maximize Crop Yield using Machine Learning Technique
- Machine Learning-Learning and Neural Networks-II
- Natural Language Processing
- Vulnerability Assessment
- Projects Email
- Machine Learning
- Vulnerability Assessment Report
- Planning-Planning systems
- Planning-Planning algorithm-I
- Vulnerabilty Assessment
- Planning
- Planning-Planning algorithm-II
- Natural Language Processing-Parsing
- Abstract of Vulnerability Assessment
- Machine Learning-Learning From Observations
- Machine Learning-Learning and Neural Networks-I
- Problem Solvingusing Search-(Single agent search)Lesson-5-Informed Search Strategies-I
- Syllabus Computer Science (3rd-8th)Sem
- Lecture 1
- Keygen Injection 2.0
- CAO-2 Model Test Paper 1
- Module 3 Part 1
- Network Questions & Answer
- CAO-II Module 2 Complete
- Module 3 Part 2
- Dumper
- Introduction
- Problem Solvingusing Search-(Single agent search)
- Computer Network soft copy (rashmita forozaun)
- Problem Solvingusing Search-(Single agent search)-Lesson-4Uninformed Search

You are on page 1of 12

Machine Learning

Version 2 CSE IIT, Kharagpur

Lesson 39

Neural Networks - III

Version 2 CSE IIT, Kharagpur

**12.4.4 Multi-Layer Perceptrons
**

In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries may be nonlinear. The typical architecture of a multi-layer perceptron (MLP) is shown below.

**Output nodes Internal nodes Input nodes
**

To make nonlinear partitions on the space we need to define each unit as a nonlinear function (unlike the perceptron). One solution is to use the sigmoid unit. Another reason for using sigmoids are that they are continuous unlike linear thresholds and are thus differentiable at all points.

x1 x2

w1 w2 wn Σ w0 X0=1 net O = σ(net) = 1 / 1 + e -net

xn

O(x1,x2,…,xn) = σ ( WX ) where: σ ( WX ) = 1 / 1 + e -WX

Function σ is called the sigmoid or logistic function. It has the following property:

d σ(y) / dy = σ(y) (1 – σ(y))

Version 2 CSE IIT, Kharagpur

**12.4.4.1 Back-Propagation Algorithm
**

Multi-layered perceptrons can be trained using the back-propagation algorithm described next. Goal: To learn the weights for all links in an interconnected multilayer network. We begin by defining our measure of error: E(W) = ½ Σd Σk (tkd – okd) 2 k varies along the output nodes and d over the training examples. The idea is to use again a gradient descent over the space of weights to find a global minimum (no guarantee). Algorithm: 1. Create a network with nin input nodes, nhidden internal nodes, and nout output nodes. 2. Initialize all weights to small random numbers. 3. Until error is small do: For each example X do • Propagate example X forward through the network • Propagate errors backward through the network Forward Propagation Given example X, compute the output of every node until we reach the output nodes:

Output Internal Input Example Compute sigmoid function

Version 2 CSE IIT, Kharagpur

Backward Propagation A. For each output node k compute the error: δk = Ok (1-Ok)(tk – Ok) B. For each hidden unit h, calculate the error: δh = Oh (1-Oh) Σk Wkh δk C. Update each network weight: C. Wji = Wji + ΔWji where ΔWji = η δj Xji (Wji and Xji are the input and weight of node i to node j)

A momentum term, depending on the weight value at last iteration, may also be added to the update rule as follows. At iteration n we have the following: ΔWji (n) = η δj Xji + αΔWji (n) Where α ( 0 <= α <= 1) is a constant called the momentum. 1. It increases the speed along a local minimum. 2. It increases the speed along flat regions. Remarks on Back-propagation 1. 2. 3. 4. It implements a gradient descent search over the weight space. It may become trapped in local minima. In practice, it is very effective. How to avoid local minima? a) Add momentum. b) Use stochastic gradient descent. c) Use different networks with different initial values for the weights.

Multi-layered perceptrons have high representational power. They can represent the following: 1. Boolean functions. Every boolean function can be represented with a network having two layers of units. 2. Continuous functions. All bounded continuous functions can also be approximated with a network having two layers of units. 3. Arbitrary functions. Any arbitrary function can be approximated with a network with three layers of units.

Version 2 CSE IIT, Kharagpur

Generalization and overfitting: One obvious stopping point for backpropagation is to continue iterating until the error is below some threshold; this can lead to overfitting.

**Validation set Error Training set Number of weight
**

Overfitting can be avoided using the following strategies. • • • Use a validation set and stop until the error is small in this set. Use 10 fold cross validation. Use weight decay; the weights are decreased slowly on each iteration.

Applications of Neural Networks Neural networks have broad applicability to real world business problems. They have already been successfully applied in many industries. Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including: • • • • • • sales forecasting industrial process control customer research data validation risk management target marketing

Version 2 CSE IIT, Kharagpur

Because of their adaptive and non-linear nature they have also been used in a number of control system application domains like, • • • • process control in chemical plants unmanned vehicles robotics consumer electronics

Neural networks are also used a number of other applications which are too hard to model using classical techniques. These include, computer vision, path planning and user modeling.

Questions

1. Assume a learning problem where each example is represented by four attributes. Each attribute can take two values in the set {a,b}. Run the candidate elimination algorithm on the following examples and indicate the resulting version space. What is the size of the space? ((a, b, b, a), +) ((b, b, b, b), +) ((a, b, b, b), +) ((a, b, a, a), -) 2. Decision Trees The following table contains training examples that help a robot janitor predict whether or not an office contains a recycling bin. STATUS DEPT. OFFICE SIZE RECYCLING BIN? 1. faculty ee large no 2. staff ee small no 3. faculty cs medium yes 4. student ee large yes 5. staff cs medium no 6. faculty cs large yes 7. student ee small yes 8. staff cs medium no Use entropy splitting function to construct a minimal decision tree that predicts whether or not an office contains a recycling bin. Show each step of the computation. 3. Translate the above decision tree into a collection of decision rules. 4. Neural networks: Consider the following function of two variables: Version 2 CSE IIT, Kharagpur

A 0 1 0 1

B 0 0 1 1

Desired Output 1 0 0 1

Prove that this function cannot be learned by a single perceptron that uses the step function as its activation function. 4. Construct by hand a perceptron that can calculate the logic function implies (=>). Assume that 1 = true and 0 = false for all inputs and outputs.

Solution

1. The general and specific boundaries of the version space are as follows: S: {(?,b,b,?)} G: {(?,?,b,?)} The size of the version space is 2. 2. The solution steps are as follows:

1. Selecting the attribute for the root node: YES Status: Faculty: Staff: Student: TOTAL: Size: Large: Medium: Small: TOTAL: (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)] + (3/8)*[ -(1/3)*log_2(1/3) - (2/3)*log_2(2/3)] + (2/8)*[ -(1/2)*log_2(1/2) - (1/2)*log_2(1/2)] = 0.35 + 0.35 + 2/8 = 0.95 (3/8)*[ -(2/3)*log_2(2/3) - (1/3)*log_2(1/3)] + (3/8)*[ -(0/3)*log_2(0/3) - (3/3)*log_2(3/3)] + (2/8)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)] = 0.35 + 0 + 0 = 0.35 NO

Version 2 CSE IIT, Kharagpur

Dept: ee: cs: TOTAL: (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)] + (4/8)*[ -(2/4)*log_2(2/4) - (2/4)*log_2(2/4)] = 0.5 + 0.5 = 1

Since status is the attribute with the lowest entropy, it is selected as the root node:

Status / Faculty / / 1-,3+,6+ BIN=? | \

| staff \ student | 2-,5-,8BIN=NO \ 4+,7+ BIN=YES

Only the branch corresponding to Status=Faculty needs further processing. Selecting the attribute to split the branch Status=Faculty: YES Dept: ee: cs: TOTAL: (1/3)*[ -(0/1)*log_2(0/1) - (1/1)*log_2(1/1)] + (2/3)*[ -(2/2)*log_2(2/2) - (0/2)*log_2(0/2)] = 0 + 0 = 0 NO

Version 2 CSE IIT, Kharagpur

Since the minimum possible entropy is 0 and dept has that minimum possible entropy, it is selected as the best attribute to split the branch Status=Faculty. Note that we don't even have to calculate the entropy of the attribute size since it cannot possibly be better than 0. Status / Faculty / / 1-,3+,6+ Dept / ee / / 1BIN=NO \ \ cs \ 3+,6+ BIN=YES | \

| staff \ student | 2-,5-,8BIN=NO \ 4+,7+ BIN=YES

Each branch ends in a homogeneous set of examples so the construction of the decision tree ends here.

**3. The rules are:
**

IF status=faculty AND dept=ee THEN recycling-bin=no IF status=faculty AND dept=cs THEN recycling-bin=yes IF status=staff THEN recycling-bin=no IF status=student THEN recycling-bin=yes

Version 2 CSE IIT, Kharagpur

**4. This function cannot be computed using a single perceptron since the function is not linearly separable:
**

B ^ | 0 1 | | ---1----0--------> | A

5. Let

Choose g to be the threshold function (output 1 on input > 0, output 0 otherwise). There are two inputs to the implies function, Therefore,

Arbitrarily choose W2 = 1, and substitute into above for all values of X1 and X2.

Note that greater/less than signs are decided by the threshold fn. and the result of For example, when X1=0 and X2=0, the resulting equation must be greater than 0 since g is threshold fn. and is True. Similarly, when X1=1 and X2=0, the resulting equation must be less than 0 since g is the threshold fn. and is False. Since W0 < 0, and W0 > W1, and W0 – W1 < 1, choose W1 = -1. Substitute where possible.

Version 2 CSE IIT, Kharagpur

Since W0 < 0, and W0 > -1, choose W0 = -0.5. Thus… W0 = -0.5, W1 = -1, W2 = 1. Finally,

Version 2 CSE IIT, Kharagpur

- ContentUploaded byarmin2200
- ReactivePower PSO WksUploaded by@nshu_theachiever
- Taguchi Integrated LEAST SQUARES SvmUploaded byguru prasad
- 36-200-2-PBUploaded byGokulnath Balasubramanian
- Samuel 2017Uploaded bySuprapto Root Adrw
- A Neural Network Model for Predicting Time Series With Interventions and a Comparative AnalysisUploaded byIbnu Fkh
- aUploaded bySesh Bikal
- PerceptConvProofUploaded byAkh Il
- Lecture Neural Nets 2010Uploaded byedgf
- Artificial Neural Networks for RF and Microwave Design From Theory to P.pdfUploaded byNom Mon
- NR-420206 -Neural Networks and ApplicationsUploaded bySrinivasa Rao G
- 10.1.1.61Uploaded bykaushik73
- Neural Networks and Statistical ModelsUploaded byCaio Césio Salgado
- 29 SepUploaded byDimple Malhotra
- neuUploaded byAjoy Mathew Arimbur
- Neural NetworkUploaded byAvdhesh Gupta
- k.facial RecogUploaded bysatiishmano
- WCE2007_pp648-651Uploaded byibtesam2
- weissg-synthesis97.pdfUploaded byjuan_montivero
- 12Uploaded byJose Heli Vallejos Coronado
- 10.1.1.50.1441Uploaded bySantosh patel
- Neurofuzzy system for short term Electric Load forecastingUploaded byJournal of Computer Science and Engineering
- 2012-1161. Neuro-Fuzzy Systems.pptUploaded bylutfijuhar
- Artificial neural network approach to network reconfiguration for loss minimization in distributiUploaded byapi-3697505
- Cs101 Assignment 04 Spring 2012Uploaded bysamra
- Neural Network Implemented in the CMOS TechnologyUploaded bypraba821
- Maxflow FFUploaded byNurkholismath
- Dynagraph Card AnalysisUploaded byCedric Kimloaz
- neuraUploaded byRaghul Vishnu
- Crop Recommendation System to Maximize Crop Yield using Machine Learning TechniqueUploaded byIRJET Journal

- Machine Learning-Learning and Neural Networks-IIUploaded byAsim Arunava Sahoo
- Natural Language ProcessingUploaded byAsim Arunava Sahoo
- Vulnerability AssessmentUploaded byAsim Arunava Sahoo
- Projects EmailUploaded byAsim Arunava Sahoo
- Machine LearningUploaded byAsim Arunava Sahoo
- Vulnerability Assessment ReportUploaded byAsim Arunava Sahoo
- Planning-Planning systemsUploaded byAsim Arunava Sahoo
- Planning-Planning algorithm-IUploaded byAsim Arunava Sahoo
- Vulnerabilty AssessmentUploaded byAsim Arunava Sahoo
- PlanningUploaded byAsim Arunava Sahoo
- Planning-Planning algorithm-IIUploaded byAsim Arunava Sahoo
- Natural Language Processing-ParsingUploaded byAsim Arunava Sahoo
- Abstract of Vulnerability AssessmentUploaded byAsim Arunava Sahoo
- Machine Learning-Learning From ObservationsUploaded byAsim Arunava Sahoo
- Machine Learning-Learning and Neural Networks-IUploaded byAsim Arunava Sahoo
- Problem Solvingusing Search-(Single agent search)Lesson-5-Informed Search Strategies-IUploaded byAsim Arunava Sahoo
- Syllabus Computer Science (3rd-8th)SemUploaded byAsim Arunava Sahoo
- Lecture 1Uploaded byAsim Arunava Sahoo
- Keygen Injection 2.0Uploaded byAsim Arunava Sahoo
- CAO-2 Model Test Paper 1Uploaded byAsim Arunava Sahoo
- Module 3 Part 1Uploaded byAsim Arunava Sahoo
- Network Questions & AnswerUploaded byAsim Arunava Sahoo
- CAO-II Module 2 CompleteUploaded byAsim Arunava Sahoo
- Module 3 Part 2Uploaded byAsim Arunava Sahoo
- DumperUploaded byAsim Arunava Sahoo
- IntroductionUploaded byAsim Arunava Sahoo
- Problem Solvingusing Search-(Single agent search)Uploaded byAsim Arunava Sahoo
- Computer Network soft copy (rashmita forozaun)Uploaded byAsim Arunava Sahoo
- Problem Solvingusing Search-(Single agent search)-Lesson-4Uninformed SearchUploaded byAsim Arunava Sahoo