CNN Overview: Convolutional Neural Networks Explained

Convolution Networks /Neural Networks (CNN)
• NN for processing data that has a known grid-like

topology
• Time – series data –organized into 1D grids taking
samples at regular time intervals
• Image data – 2D grid of pixels
• NN engages mathematical convolution operation
(linear) and hence CNN
• NNs deploy convolution replacing matrix
multiplication in at least one of their layers
CNN (ctd…)
• Example of Neuroscientific principles

influencing DL
• x –Input, w – kernel, s – feature map
•• ML applications
In the - I/p Multidimensional array
discrete form
(tensors), kernel - Multidimensional array of
parameters adapted by LA
• 2D convolution
CNN (ctd…)
• Convolution is commutative
• Flipping the kernel relative to the i/p

• Cross-correlation (convolution without kernel
flipping)
• Discrete convolution – Multiplication by a matrix
• Each row of the matrix is constrained to be equal
to the row above shifted by one element - Toeplitz
• In 2D – Doubly Block Circulant Matrix / Sparse
• Restrict the output to only
CNN (ctd…) positions where the kernel
lies entirely within the
image, called “valid”
convolution
• upper-left element of the
output tensor is formed by
applying the kernel to the
corresponding upper-left
region of the input tensor
• Block processing
CNN (ctd…) 0

1
1
1
0
1
0 1 0
• Three factors improving ML
– Sparse Interactions, Parameter sharing, equivalent representation
• Traditional NN use matrix multiplication by a matrix of parameters with a

separate parameter describing the interaction between each i/p unit and
each o/p unit. Each I/p and O/p interacts
• CNN – sparse interactions / sparse connectivity / sparse weights
• Achieved by having kernel smaller than the input – edge detection in
image.
• Requires less memory with improved statistical efficiency
• m-inputs, n-outputs, Matrix multiplication requires parameters.
Therefore, runtime per example
• If the connections are limited k, sparse connections will render parameters
Sparse Connectivity
Allows n/w to efficiently
describe complicated
interactions between many
variables by constructing
such interactions from simple
building blocks that each
describe only sparse
interactions
Parameter sharing
• Using the same parameter for more than one function in a
model
• Limited Capacity to linear functions
• network has tied weights, because the value of the weight
applied to one input is tied to the value of a weight applied
elsewhere
• CNN - each member of the kernel is used at every position of
the input
• convolution operation means that rather than learning a
separate set of parameters for every location, instead learn
from one set
Parameter sharing (ctd …)

dramatically more efficient than
dense matrix multiplication in terms
of the memory requirements and
statistical efficiency.
Equivariance to translation - if the
input changes, the output changes
in the same way
Function f(x) is equivariant to a
function g.
Parameter sharing (ctd …)
Pooling
• CNN comprises three stages
– Stage 1 performs several convolutions in parallel to
produce a set of linear activations.
– Stage 2 linear activation is run through a nonlinear
activation function ReLU (detector stage)
– Stage 3 pooling function to modify the o/p of the layer.
• Pooling function replaces the o/p of the net at a

certain location with a summary statistic of the
nearby outputs
Pooling (ctd..)
• Max Pooling reports

the maximum o/p
within a
neighborhood /
Average / L2 norm/
weighted average
• Representation is
approximately
translation invariant
Pooling (ctd..)
Pooling (ctd…)
Variants of Convolution function
• Convolution with a single kernel extracts only one kind of
feature, albeit at many spatial locations.
• Each network layer has to extract many kinds of features, at
many locations.
• I/p is a grid of vectors and not real values.
• CNNs deploy multi-channel convolution wherein linear

operations does not guarantee commutative, even if kernel-
flipping is used.
• Multi-channel operations are commutative if each operation

has the same number of o/p channels as i/p channels.
•
• 4D kernel tensor with element offset k-rows,
l-columns
• Downsampling by s-pixels / stride

Conv
oluti
on
with
strid
e2
• Zero-Pad(ZP)ding (for widening the input
• ZP controls the kernel width and o/p size
independently
• MATLAB – valid convolution (no zero
padding)
• Same convolution (equal sized i/p and o/p)
• Full convolution - zeroes are added for every
pixel to be visited k times in each direction
• Optimal zero padding lies between “valid” and
“same”
• Adjacency matrix – leading to 6D tensor
• Unshared convolution – no parameter sharing

Unshared convolution
• Local connections –
feature being a
function of a small
part of a space
• Each o/p i is a
function of a subset of
i/p l
• Connect first m o/p
channels with n i/p
channels and so on.
• Interactions between few
channels allows the n/w to
have fewer parameters in
order to reduce memory
consumption and increase
statistical efficiency with
reduced computations for
performing forward and
back-propagation.
• This is achieved without
reducing the number of
hidden units.
Tiled convolution
• Compromise between a
convolutional layer and a locally
connected layer
• Neighboring locations will have
different filters, like in a locally
connected layer
• Memory required for storing
parameters increases by a factor of
the size of this set of kernels, rather
than the size of the entire output
feature map.
Tiled convolution
• K
– 6D tensor output locations cycle through a
set of t different choices of kernel stack in each
direction, t – o/p width, % - modulo operation,

CNN Overview: Convolutional Neural Networks Explained

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CNN Overview: Convolutional Neural Networks Explained

Uploaded by

Copyright:

Available Formats

Convolution Networks /Neural Networks (CNN)

• NN for processing data that has a known grid-like

• Example of Neuroscientific principles

• Flipping the kernel relative to the i/p

• Traditional NN use matrix multiplication by a matrix of parameters with a

interactions between many

such interactions from simple

building blocks that each

describe only sparse

• Pooling function replaces the o/p of the net at a

• Max Pooling reports

• CNNs deploy multi-channel convolution wherein linear

• Multi-channel operations are commutative if each operation

• Downsampling by s-pixels / stride

• Unshared convolution – no parameter sharing

convolutional layer and a locally

different filters, like in a locally

parameters increases by a factor of

the size of this set of kernels, rather

than the size of the entire output

You might also like