0 views

Uploaded by HarshitShukla

- Waldon n
- Prediction of Heart Disease Using Back Propagation Mlp Algorithm
- Build a Neural Network in Python _ Enlight
- 270976_predictivemodel
- Artificial Neural Networks for Hull Resistance Prediction
- Chapter3 Supplementary
- Combining Neural Network and Firefly Algorithm to Predict Stock Price in Tehran exchange
- Artificial Neural Network
- BioCyberUrban parQ Draft
- New Microsoft Office Word Document
- Poster 12
- Neural Networks & Genomicengg
- Shear Capacity of Reinforced Concrete Using Neural network.pdf
- IJTC201601001-A Novel Approach in Software Cost Estimation Combining Swarm Optimization and Clustering Technique
- 1-s2.0-S2210784310000367-main
- 09 Artificial Neural Networks and Classification
- Mitja Perus, Mult Syn Comp in Brain.pdf
- A Comparison of Three Neural Network Architectures For
- 02-au-xu
- 598819 80 3D Geological Modeling Talavera Final

You are on page 1of 38

Chapter 6 :Deep Feedforward Networks

Linear regression (and classication)

Input vector x

Output vector y

Parameters Weight W and bias b

Prediction : y = W> x + b

Linear regression (and classication)

Input vector x

Output vector y

Parameters Weight W and bias b

Prediction : y = W> x + b

W b

x u y

Linear regression (and classication)

Input vector x

Output vector y

Parameters Weight W and bias b

Prediction : y = W> x + b

x1 W11 . . . W23 b1

u1 y1

x2 b2

u2 y2

x3

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Linear regression (and classication)

Input vector x

Output vector y

Parameters Weight W and bias b

Prediction : y = W> x + b

x1 W, b

y1

x2

y2

x3

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 2 / 25

Linear regression (and classication)

Advantages

Easy to use

Easy to train, low risk of overtting

Drawbacks

Some problems are inherently non-linear

Solving XOR

Solving XOR

Linear regressor

x1 W, b

y

x2

There is no value for W and b

such that ∀(x1 , x2 ) ∈ {0, 1}2

x1

W >

+ b = xor (x1 , x2 )

x2

Solving XOR

What about... ?

W, b

x1 u1 V, c

y

x2 u2

Solving XOR

What about... ?

W, b

x1 u1 V, c

y

x2 u2

Strictly equivalent :

The composition of two linear operation is still a linear

operation

Solving XOR

And about... ?

W, b φ

x1 u1 h1 V, c

y

x2 u2 h2

In which φ(x ) = max {0, x }

Solving XOR

And about... ?

W, b φ

x1 u1 h1 V, c

y

x2 u2 h2

In which φ(x ) = max {0, x }

It is possible !

1 1 0 1

With W = ,b= ,V= and c = 0,

1 1 −1 −2

Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward Networks 6 / 25

Neural network with one hidden layer

Compact representation

W , b, φ V, c

x h y

Neural network

Hidden layer with non-linearity

→ can represent broader class of function

Universal approximation theorem

Theorem

A neural network with one hidden layer can approximate any

continuous function

n

Cn is a compact subset of R ,

K

:x→ vi φ(wi > x + bi ) + c

X

ε

∀ε, ∃fNN

i =1

such that

ε

∀x ∈ Cn , ||f (x ) − fNN (x )|| < ε

Problems

The universal theorem gives no information about HOW to

obtain such a network

Size of the hidden layer h

Values of W and b

Problems

The universal theorem gives no information about HOW to

obtain such a network

Size of the hidden layer h

Values of W and b

Using the network

Even if we nd a way to obtain the network, the size of the

hidden layer may be prohibitively large.

Deep neural network

Why Deep ?

Let's stack l hidden layers one after the other; l is called the

length of the network.

W 1 , b1 , φ W 2 , b2 , φ W l , bl , φ V, c

x h1 ... hl y

Properties of DNN

The universal approximation theorem also apply

Some functions can be approximated by a DNN with N

hidden unit, and would require O(e N ) hidden units to be

represented by a shallow network.

Summary

Comparison

Linear classier

− Limited representational power

+ Simple

+ Unlimited representational power

+ Unlimited representational power

Summary

Comparison

Linear classier

− Limited representational power

+ Simple

+ Unlimited representational power

+ Unlimited representational power

Remaining problem

How to get this DNN ?

On the path of getting my own DNN

Hyperparameters

First, we need to dene the architecture of the DNN

The depth l

The size of the hidden layers n1 , . . . , nl

The activation function φ

The output unit

On the path of getting my own DNN

Hyperparameters

First, we need to dene the architecture of the DNN

The depth l

The size of the hidden layers n1 , . . . , nl

The activation function φ

The output unit

Parameters

When the architecture is dened, we need to train the DNN

W1 , b1 , . . . , Wl , bl

Hyperparameters

The depth l

The size of the hidden layers n1 , . . . , nl

Strongly depend on the problem to solve

Hyperparameters

The depth l

The size of the hidden layers n1 , . . . , nl

Strongly depend on the problem to solve

g : x 7→ max { , xx } 1

σ : x 7→ ( + e )

ReLU 0

Sigmoid 1

− −

Hyperparameters

The depth l

The size of the hidden layers n1 , . . . , nl

Strongly depend on the problem to solve

g : x 7→ max { , xx } 1

σ : x 7→ ( + e )

ReLU 0

Sigmoid 1

− −

Linear outputE[y] = V> hl + c

For regression with Gaussian distribution y ∼ N (E[y], I)

ŷ

Sigmoid output = σ(w> hl + ) b

For classication with Bernouilli distribution P (y = 1|x) = ŷ

Parameters Training

Objective

Let's dene θ = (W1 , b1 , . . . , Wl , bl ).

We suppose we have a set of inputs X = (x1 , . . . , xN ) and a

set of expected outputs Y = (y1 , . . . , yN ). The goal is to nd

a neural network fNN such that

∀i , fNN (x i , θ) ' yi .

Parameters Training

Cost function

To evaluate the error that our current network makes, let's

dene a cost function L(X, Y, θ). The goal becomes to nd

argmin L(X, Y, θ)

θ

Loss function

Should represent a combination of the distances between every

yi and the corresponding fNN (xi , θ)

Mean square error (rare)

Cross-entropy

Parameters Training

The basic idea consists in computing θ̂ such that

∇θ L(X, Y, θ̂) = 0.

of degrees of freedom.

Parameters Training

Gradient descent

Let's use a numerical way to optimize θ, called the gradient

descent (section 4.3). The idea is that

f (θ − εu) ' f (θ) − εu> ∇f (θ)

f (θ − εu) ' f (θ) − εu> u < f (θ).

improves our estimate.

Parameters Training

1

Have an estimate θ̂ of the parameters

2

Compute ∇θ L(X, Y, θ̂)

3

Update θ̂ ←− θ̂ − ε∇θ L

4

Repeat step 2-3 until ∇θ L < threshold

Parameters Training

1

Have an estimate θ̂ of the parameters

2

Compute ∇θ L(X, Y, θ̂)

3

Update θ̂ ←− θ̂ − ε∇θ L

4

Repeat step 2-3 until ∇θ L < threshold

Problem

How to estimate eciently ∇θ L(X, Y, θ̂) ?

Back-propagation algorithm

Back-propagation for Parameter Learning

w 2 w 1

y φ φ

x

with function:

y =φ w2 φ(w1 x ) ,

N

some training pairs T = x̂n , ŷn n=1 , and

an activation-function φ().

Learn w1 , w2 so that: Feeding x̂n results ŷn .

Prerequisite: dierentiable activation function

Let φ0 (x ) = ∂φ(x ) denote the derivative of φ(x ).

∂x

For example when φ(x ) = Relu(x ) we have:

Gradient-based Learning

We will learn the weights by iterating:

updated ∂L

∂ w1

w1 w1

= −γ , (1)

w2 w2 ∂L

∂ w2

L(w1 , w2 , T ) and we want to compute the gradient(s) at

w1 , w2 .

Back-propagation

values on all units:

∂L(d )

∂d = L0 ( ).d

a = w1x̂n

6

1 .

∂d

= φ0 ( w2φ(w1x̂n ))

b = φ(w1x̂n ) ∂c

7 .

.

∂c

w2

2

c = w2φ(w1x̂n )

8

∂b = .

w1x̂n )

3 .

∂b

d = φ w2φ(w1x̂n ) ∂a = φ0 ( .

9

4 .

5 .

Calculating the Gradients I

∂L ∂L ∂ d ∂ c ∂ b ∂ a

= ,

∂ w1 ∂ d ∂ c ∂ b ∂ a w1

∂L(d ) ∂L(d ) ∂ d ∂ c

= .

∂ w2 ∂ d ∂ c w2

Calculating the Gradients I

∂L ∂L ∂ d ∂ c ∂ b ∂ a

= ,

∂ w1 ∂ d ∂ c ∂ b ∂ a w1

∂L(d ) ∂L(d ) ∂ d ∂ c

= .

∂ w2 ∂ d ∂ c w2

We propage the gradients (partial products) from the last

layer towards the input.

Calculating the Gradients

N

∂L X ∂L(dn ) ∂ dn ∂ cn ∂ bn ∂ an

= ,

∂ w1 n=1 ∂ dn ∂ cn ∂ bn ∂ an w1

N

∂L X ∂L(dn ) ∂ dn ∂ cn

= .

∂ w2 n=1 ∂ dn ∂ cn w2

Thank you !

- Waldon nUploaded byNadirSha Mavudi
- Prediction of Heart Disease Using Back Propagation Mlp AlgorithmUploaded byIJSTR Research Publication
- Build a Neural Network in Python _ EnlightUploaded byPedro Elias Romero Nieto
- 270976_predictivemodelUploaded byAdvarel
- Artificial Neural Networks for Hull Resistance PredictionUploaded byAntonis Constantinou
- Chapter3 SupplementaryUploaded byKan Samuel
- Combining Neural Network and Firefly Algorithm to Predict Stock Price in Tehran exchangeUploaded byATS
- Artificial Neural NetworkUploaded byAvecc Amourr
- BioCyberUrban parQ DraftUploaded byKiko Barretto
- Poster 12Uploaded byamskipper1212
- Neural Networks & GenomicenggUploaded byprasen_cse
- IJTC201601001-A Novel Approach in Software Cost Estimation Combining Swarm Optimization and Clustering TechniqueUploaded byInternational Journal of Technology and Computing (IJTC)
- New Microsoft Office Word DocumentUploaded byAjay
- Shear Capacity of Reinforced Concrete Using Neural network.pdfUploaded byjosecarlosqp
- 1-s2.0-S2210784310000367-mainUploaded byFuad Ahmad
- 09 Artificial Neural Networks and ClassificationUploaded bysiddu
- Mitja Perus, Mult Syn Comp in Brain.pdfUploaded byAlexandre Manenti Britto Pereira Dos Santos
- A Comparison of Three Neural Network Architectures ForUploaded byGanesh Kumar Arumugam
- 02-au-xuUploaded byĐặng Sơn Tùng
- 598819 80 3D Geological Modeling Talavera FinalUploaded bysomebody404
- delamination.pdfUploaded byRavi Sankar B
- 17 EnachescuMultidimension VI 17-1-6Uploaded byrbdnn
- Load Flow Algorithm ANNUploaded byishak789
- A56 1 Dey Current Controllers of Active Power Filter for Power Quality Improvement a Technical AnalysisUploaded bySaasi
- 190-196 AlqassarUploaded bySumartono Sarib
- 10.1.1.148Uploaded byBima Aulia Firmandani
- mrefrobotarmUploaded byFranklin
- 0910 Neural Networks 2Uploaded byAkash Gupta
- SCA ANN-02Uploaded bySoumya Paik

- YG Core Competency FrameworkUploaded byHarshitShukla
- Onboarding - FAQ'sUploaded byHarshitShukla
- Study and Practice of Yoga Volume-IUploaded byHarshitShukla
- Yoga of Knowledge Pandit M.P.Uploaded byHarshitShukla
- Deep Learning LibUploaded byrakikira
- computer-science-major-worksheet-2016-17.pdfUploaded byHarshitShukla
- Srimad-Bhagvatam.pdfUploaded bymonasha
- Glimpses of Vedic LiteratureUploaded byHarshitShukla
- Presentation (1)Uploaded byHarshitShukla
- Dem128064f Fgh PwUploaded byHarshitShukla
- InterOP_TestReportUploaded byHarshitShukla
- Fgh Global 141104 EnglishUploaded byHarshitShukla
- 130559.full.pdfUploaded byHarshitShukla
- Teaching-Philosophy-Guidelines.pdfUploaded byHarshitShukla
- eDTP_instructions_1_2.pdfUploaded byHarshitShukla
- 13494T 2017IF USA-CAN Wild West Cowboys and Buffalos STD EDTPUploaded byHarshitShukla
- IRCP-Policies-Pic-4-3-16.pdfUploaded byHarshitShukla
- Retiree_Tuition_Benefit.pdfUploaded byHarshitShukla
- 1york-yhe-5005747-ytg-a-0216Uploaded byHarshitShukla
- lec07-dfsbfsUploaded byHarshitShukla
- _528195-ytg-a-0609Uploaded byHarshitShukla
- 251933-YTG-J-1009SunlineXPUploaded byHarshitShukla
- Modern Placer MiningUploaded byHarshitShukla
- ytg.pdfUploaded byHarshitShukla
- Mcbee v Gy YhrbaUploaded byHarshitShukla
- Manual Chiller York Model Ytg 5Uploaded byHarshitShukla
- Manual Chiller York Model Ytg 3Uploaded byHarshitShukla
- DFSUploaded byHarshitShukla
- Manual Chiller York Model Ytg 6Uploaded byHarshitShukla

- Problem SetUploaded byGobara Dhan
- Algorithm to Merge Sorted ArraysUploaded byRollin Revie
- DSIP-COMP Study Point of ViewUploaded bypriyanka
- Sec 5.3 pptUploaded byTrey Cross
- Design and Analysis of AlgorithmUploaded byArun Karthik
- DCT-History_How I Came Up with the Discrete Cosine TransformUploaded byworkhardplayhard
- Matlab ReverbUploaded byJo E Fran Rodrigues
- Genetic Algorithm and Direct SearchToolbox With MatlabUploaded bySubrat Kumar Sahoo
- Simulated Annealing OverviewUploaded byJoseJohn
- Biomedical Signal ProcessingUploaded byzekamin
- Digitial image processingUploaded byHiren Kumar Praharaj
- Tree SearchingUploaded byJunaid Akram
- 3.Lu DecompositionUploaded bySamuel Mawutor Gamor
- Image_Compression_using_DCT_&_DWT.pdfUploaded byHimanshu Parmar
- 4.5.4 Pulse Code Modulation.pdfUploaded byFirdaus
- A Low-Power, Area-Efficient Digital Filter for Decimation and InterpolationUploaded bytheTERAstar
- (S-CFAR)a CFAR Algorithm for Radar Detection Under Severe InterferenceUploaded byfafa_k
- Detection of objects using image segmentationUploaded bypavani gunji
- Lec09 Threshold 4Uploaded byBalázs Brunáczky
- Analysis of Algorithm Lecture BreakdownUploaded byHafiz Abdulwahab
- deskripsiUploaded byEnky Ariadma Haning
- 12 Deepak Fuzzy Logic SVMUploaded byRaj Lakhani
- Reed Solomon DocUploaded bykalkam
- Particle FiltersUploaded byGeorge Mathai
- ASSIGNMENT 1 Math.pdfUploaded bylaura
- Particle Swarm Optimization Using C#Uploaded byabdulhakimbhs
- Chapter 7 Lossless Compression AlgorithmsUploaded byToaster97
- DSP-LECT-10-11-12Uploaded byStaoins
- Financial Forecasting and Gradient DescentUploaded byp_sanecki
- 239387951-Mini-AES-pptx.pptxUploaded byMohamed Khalil