You are on page 1of 14

Convolution Networks /Neural Networks (CNN)

• NN for processing data that has a known grid-like


topology
• Time – series data –organized into 1D grids taking
samples at regular time intervals
• Image data – 2D grid of pixels
• NN engages mathematical convolution operation
(linear) and hence CNN
• NNs deploy convolution replacing matrix
multiplication in at least one of their layers
CNN (ctd…)

• Example of Neuroscientific principles


influencing DL
• x –Input, w – kernel, s – feature map
•• ML applications
In the - I/p Multidimensional array
discrete form
(tensors), kernel - Multidimensional array of
parameters adapted by LA
• 2D convolution
CNN (ctd…)
• Convolution is commutative

• Flipping the kernel relative to the i/p


• Cross-correlation (convolution without kernel
flipping)
• Discrete convolution – Multiplication by a matrix
• Each row of the matrix is constrained to be equal
to the row above shifted by one element - Toeplitz
• In 2D – Doubly Block Circulant Matrix / Sparse
• Restrict the output to only
CNN (ctd…) positions where the kernel
lies entirely within the
image, called “valid”
convolution
• upper-left element of the
output tensor is formed by
applying the kernel to the
corresponding upper-left
region of the input tensor
• Block processing
CNN (ctd…) 0
 
1
1
1
0
1
  0 1 0
• Three factors improving ML
– Sparse Interactions, Parameter sharing, equivalent representation

• Traditional NN use matrix multiplication by a matrix of parameters with a


separate parameter describing the interaction between each i/p unit and
each o/p unit. Each I/p and O/p interacts
• CNN – sparse interactions / sparse connectivity / sparse weights
• Achieved by having kernel smaller than the input – edge detection in
image.
• Requires less memory with improved statistical efficiency
• m-inputs, n-outputs, Matrix multiplication requires parameters.
Therefore, runtime per example
• If the connections are limited k, sparse connections will render parameters
Sparse Connectivity
Allows n/w to efficiently

describe complicated

interactions between many

variables by constructing

such interactions from simple

building blocks that each

describe only sparse

interactions
Parameter sharing
• Using the same parameter for more than one function in a
model
• Limited Capacity to linear functions
• network has tied weights, because the value of the weight
applied to one input is tied to the value of a weight applied
elsewhere
• CNN - each member of the kernel is used at every position of
the input
• convolution operation means that rather than learning a
separate set of parameters for every location, instead learn
from one set
Parameter sharing (ctd …)

 
dramatically more efficient than
dense matrix multiplication in terms
of the memory requirements and
statistical efficiency.
Equivariance to translation - if the
input changes, the output changes
in the same way
Function f(x) is equivariant to a
function g.
Parameter sharing (ctd …)
Pooling
• CNN comprises three stages
– Stage 1 performs several convolutions in parallel to
produce a set of linear activations.
– Stage 2 linear activation is run through a nonlinear
activation function ReLU (detector stage)
– Stage 3 pooling function to modify the o/p of the layer.

• Pooling function replaces the o/p of the net at a


certain location with a summary statistic of the
nearby outputs
Pooling (ctd..)

• Max Pooling reports


the maximum o/p
within a
neighborhood /
Average / L2 norm/
weighted average
• Representation is
approximately
translation invariant
Pooling (ctd..)
Pooling (ctd…)

You might also like