You are on page 1of 23

Convolution Networks /Neural Networks (CNN)

• NN for processing data that has a known grid-like


topology
• Time – series data –organized into 1D grids taking
samples at regular time intervals
• Image data – 2D grid of pixels
• NN engages mathematical convolution operation
(linear) and hence CNN
• NNs deploy convolution replacing matrix
multiplication in at least one of their layers
CNN (ctd…)

• Example of Neuroscientific principles


influencing DL
• x –Input, w – kernel, s – feature map
•• ML applications
In the - I/p Multidimensional array
discrete form
(tensors), kernel - Multidimensional array of
parameters adapted by LA
• 2D convolution
CNN (ctd…)
• Convolution is commutative

• Flipping the kernel relative to the i/p


• Cross-correlation (convolution without kernel
flipping)
• Discrete convolution – Multiplication by a matrix
• Each row of the matrix is constrained to be equal
to the row above shifted by one element - Toeplitz
• In 2D – Doubly Block Circulant Matrix / Sparse
• Restrict the output to only
CNN (ctd…) positions where the kernel
lies entirely within the
image, called “valid”
convolution
• upper-left element of the
output tensor is formed by
applying the kernel to the
corresponding upper-left
region of the input tensor
• Block processing
CNN (ctd…) 0
 
1
1
1
0
1
  0 1 0
• Three factors improving ML
– Sparse Interactions, Parameter sharing, equivalent representation

• Traditional NN use matrix multiplication by a matrix of parameters with a


separate parameter describing the interaction between each i/p unit and
each o/p unit. Each I/p and O/p interacts
• CNN – sparse interactions / sparse connectivity / sparse weights
• Achieved by having kernel smaller than the input – edge detection in
image.
• Requires less memory with improved statistical efficiency
• m-inputs, n-outputs, Matrix multiplication requires parameters.
Therefore, runtime per example
• If the connections are limited k, sparse connections will render parameters
Sparse Connectivity
Allows n/w to efficiently

describe complicated

interactions between many

variables by constructing

such interactions from simple

building blocks that each

describe only sparse

interactions
Parameter sharing
• Using the same parameter for more than one function in a
model
• Limited Capacity to linear functions
• network has tied weights, because the value of the weight
applied to one input is tied to the value of a weight applied
elsewhere
• CNN - each member of the kernel is used at every position of
the input
• convolution operation means that rather than learning a
separate set of parameters for every location, instead learn
from one set
Parameter sharing (ctd …)

 
dramatically more efficient than
dense matrix multiplication in terms
of the memory requirements and
statistical efficiency.
Equivariance to translation - if the
input changes, the output changes
in the same way
Function f(x) is equivariant to a
function g.
Parameter sharing (ctd …)
Pooling
• CNN comprises three stages
– Stage 1 performs several convolutions in parallel to
produce a set of linear activations.
– Stage 2 linear activation is run through a nonlinear
activation function ReLU (detector stage)
– Stage 3 pooling function to modify the o/p of the layer.

• Pooling function replaces the o/p of the net at a


certain location with a summary statistic of the
nearby outputs
Pooling (ctd..)

• Max Pooling reports


the maximum o/p
within a
neighborhood /
Average / L2 norm/
weighted average
• Representation is
approximately
translation invariant
Pooling (ctd..)
Pooling (ctd…)
Variants of Convolution function
• Convolution with a single kernel extracts only one kind of
feature, albeit at many spatial locations.
• Each network layer has to extract many kinds of features, at
many locations.
• I/p is a grid of vectors and not real values.

• CNNs deploy multi-channel convolution wherein linear


operations does not guarantee commutative, even if kernel-
flipping is used.

• Multi-channel operations are commutative if each operation


has the same number of o/p channels as i/p channels.
Variants of Convolution function
•  
• 4D kernel tensor with element offset k-rows,
l-columns

• Downsampling by s-pixels / stride


Conv
oluti
on
with
strid
e2
Variants of Convolution function
• Zero-Pad(ZP)ding (for widening the input
• ZP controls the kernel width and o/p size
independently
Variants of Convolution function
• MATLAB – valid convolution (no zero
padding)
• Same convolution (equal sized i/p and o/p)
• Full convolution - zeroes are added for every
pixel to be visited k times in each direction
• Optimal zero padding lies between “valid” and
“same”
• Adjacency matrix – leading to 6D tensor

• Unshared convolution – no parameter sharing


Unshared convolution
• Local connections –
feature being a
function of a small
part of a space
• Each o/p i is a
function of a subset of
i/p l
• Connect first m o/p
channels with n i/p
channels and so on.
• Interactions between few
channels allows the n/w to
have fewer parameters in
order to reduce memory
consumption and increase
statistical efficiency with
reduced computations for
performing forward and
back-propagation.
• This is achieved without
reducing the number of
hidden units.
Tiled convolution
• Compromise between a

convolutional layer and a locally

connected layer
• Neighboring locations will have

different filters, like in a locally

connected layer
• Memory required for storing

parameters increases by a factor of

the size of this set of kernels, rather

than the size of the entire output

feature map.
Tiled convolution
• K
  – 6D tensor output locations cycle through a
set of t different choices of kernel stack in each
direction, t – o/p width, % - modulo operation,

You might also like