You are on page 1of 41

Neural Networks & Deep Learning

Unit-6
Dr. D. SUDHEER
Assistant Professor
Department of CSE
VNR VJIET (NAAC: A++, NIRF: 113)
Hyderabad, Telangana.

©Dr. SUDHEER DEVULAPALLI 1


How to apply NN over Image?
Multi-layer Neural Network & Image

©Dr. SUDHEER DEVULAPALLI 2


How to apply NN over Image?
Multi-layer Neural Network & Image
Stretch pixels in single column vector

©Dr. SUDHEER DEVULAPALLI 3


How to apply NN over Image?
Multi-layer Neural Network & Image
Stretch pixels in single column vector

Problems ?

©Dr. SUDHEER DEVULAPALLI 4


How to apply NN over Image?
Multi-layer Neural Network & Image
Stretch pixels in single column vector

High dimensionality
Problems ? Local relationship

©Dr. SUDHEER DEVULAPALLI 5


How to apply NN over Image?
Multi-layer Neural Network & Image
Stretch pixels in single column vector

High dimensionality
Solutions ? Local relationship

©Dr. SUDHEER DEVULAPALLI 6


Convolutional Neural Networks
• Also known as
CNN, ConvNet, DCN
• CNN = a multi-layer neural network with
1. Local connectivity 2. Weight sharing

©Dr. SUDHEER DEVULAPALLI 7


Convolution Neural
Network(CNN)

©Dr. SUDHEER DEVULAPALLI 8


©Dr. SUDHEER DEVULAPALLI 9
For convolution and pooling operations open CNN layers unit5.pdf file.

©Dr. SUDHEER DEVULAPALLI 10


CNN Local and Global connectivity Input neurons:7, Hidden units:3

Number of parameters:
Global connectivity:?
Local connectivity: ?

©Dr. SUDHEER DEVULAPALLI 11


CNN Local and Global connectivity Input neurons:7, Hidden units:3

Number of parameters:
Global connectivity:3*7=21
Local connectivity: 3*3=9

©Dr. SUDHEER DEVULAPALLI 12


CNN Local and Global connectivity Input neurons:7, Hidden units:3

Number of parameters:
Without weight sharing:3*3=9
With weight sharing: 3*1=3

©Dr. SUDHEER DEVULAPALLI 13


Layers in CNN

• Input Layer (Ex. Input image)


• Convolution layer
• Non linearity layer
• Pooling layer
• fully connected layer
• classification layer

©Dr. SUDHEER DEVULAPALLI 14


©Dr. SUDHEER DEVULAPALLI 15
©Dr. SUDHEER DEVULAPALLI 16
©Dr. SUDHEER DEVULAPALLI 17
©Dr. SUDHEER DEVULAPALLI 18
©Dr. SUDHEER DEVULAPALLI 19
©Dr. SUDHEER DEVULAPALLI 20
©Dr. SUDHEER DEVULAPALLI 21
©Dr. SUDHEER DEVULAPALLI 22
©Dr. SUDHEER DEVULAPALLI 23
3D ConvNet

©Dr. SUDHEER DEVULAPALLI 24


Figure 3: Difference: 2D convolution and 3D convolution
[2]

©Dr. SUDHEER DEVULAPALLI 25


Difference: 2D convolution and 3D convolution
 2D convolution applied on an image will output an image.
 2D convolution applied on multiple images (treating them as different channels)
also results in an image.
 Hence, 2D ConvNets lose temporal information of the input signal right after
every convolution operation.

 Only 3D convolution preserves the temporal information of the input signals


resulting in an output volume.

©Dr. SUDHEER DEVULAPALLI 26


Batch normalization and layers:
• To accelerate training in CNNs we can normalize the activations of the previous
layer at each batch.
• This technique applies a transformation that keeps the mean activation close to
0.0 while also keeping the activation standard deviation close to 1.0.
• By applying normalization for each training mini-batch of input records, we can
use much higher learning rates.
• Batch normalization also reduces the sensitivity of training toward weight
initialization and acts as a regularizer.
Fully Connected Layers:
• We use this layer to compute class scores that we’ll use as output of the
network.
• Fully connected layers perform transformations on the input data volume that
are a function of the activations in the input volume and the parameters.
Applications of CNN:
• MRI data
•3D shape data
• Graph data
• NLP applications
Recurrent Neural Networks

•Historically, these networks have been difficult to train, but more recently,
advances in research (optimization, network architectures, parallelism, and
graphics processing units [GPUs]) have made them more approachable for the
practitioner.
• Recurrent Neural Networks take each vector from a sequence of input vectors
and model them one at a time.
• Modeling the time dimension is a hallmark of Recurrent Neural Networks.
Modeling the Time Dimension:
• Recurrent Neural Networks are considered Turing complete and can simulate
arbitrary programs (with weights).
• Recurrent neural networks are well suited for modeling functions for which
the input and/or output is composed of vectors that involve a time dependency
between the values.
• Recurrent neural networks model the time aspect of data by creating cycles in
the network (hence, the “recurrent” part of the name). ©Dr. SUDHEER DEVULAPALLI 29
Lost in Time:
• Many classification tools (support vector machines, logistic regression, and
regular feed-forward networks) have been applied successfully without
modeling the time dimension, assuming independence.
• Other variations of these tools capture the time dynamic by modeling a
sliding window of the input (e.g., the previous, current, and next input together
as a single input vector).
• A drawback of these tools is that assuming independence in the time
connection between model inputs does not allow our model to capture long-
range time dependencies.
• Sliding window techniques have a limited window width and will fail to
capture any effects larger than the fixed window size.
• Good example is automatic replies by machines for conversations over time.

©Dr. SUDHEER DEVULAPALLI 30


Temporal feedback and loops in connections:
• Recurrent Neural Networks can have loops in the connections.
•This allows them to model temporal behavior gain accuracy in domains such
as time-series, language, audio, and text.
• Data in these domains are inherently ordered and context sensitive where
latern values depend on previous ones.
• A Recurrent Neural Network includes a feedback loop that it uses to learn
from sequences, including sequences of varying lengths.
• Recurrent Neural Networks contain an extra parameter matrix for the
connections between time-steps, which are used/trained to capture the temporal
relationships in the data.
• Recurrent Neural Networks are trained to generate sequences, in which the
output at each time-step is based on both the current input and the input at all
previous time steps.
• Recurrent Neural Networks compute a gradient with an algorithm called back
propagation through time (BPTT). ©Dr. SUDHEER DEVULAPALLI 31
Applications for Sequences and time-series data:

• Image captioning
•Speech synthesis
•Music generation
•Playing video games
•Language modeling
•Character-level text generation models

Understanding model input and output:


• Recurrent Neural Networks change the fixed input to dynamic to include
multiple input vectors, one for each time-step, and each vector can have many
columns.

©Dr. SUDHEER DEVULAPALLI 32


• One-to-many: sequence output. For example, image captioning takes an
image and outputs a sequence of words.
•Many-to-one: sequence input. For example, sentiment analysis where a given
sentence is input.
•Many-to-many: For example, video classification: label each frame.

©Dr. SUDHEER DEVULAPALLI 33


©Dr. SUDHEER DEVULAPALLI 34
Traditional RNN

©Dr. SUDHEER DEVULAPALLI 35


LSTM Neuron

A
LSTM

©Dr. SUDHEER DEVULAPALLI 36


• It will be useful to remember the past data along with the present data to take
decision.
• Example, In a sentence beginning words more important than the lat words
to understand the meaning.
• LSTM stores all the words along with recent words to take decision.

LSTM

Long-term-memory Short-term-memory

©Dr. SUDHEER DEVULAPALLI 37


• Long-term memory represents all the words starting from the first word.
• Short-term-memory represents recent words from past state of the model.
• when LSTM keep on storing data, it may reach where they cannot store
further.
• It will remove the unwanted information from time to time.
• The removing or keeping the data implemented by logic gates.

input Gate
output Gate
Forget 1 2 3
Gate
LSTM
Pass
Forget updated
irrelevant information
information

New updated
information ©Dr. SUDHEER DEVULAPALLI 38
Layers of RNN
There two important layers: 1. Embedding 2. LSTM
1. Embedding
• It is useful to convert positive integers to vector of values.
• Fixed range of input values should be provide this layer.
• It will be more useful in language translation to understand the meaning.

Embedding(input_dim,output_dim,input_length)

©Dr. SUDHEER DEVULAPALLI 39


LSTM:
• The LSTM network is different to a classical MLP.
• Input data is propagated through the network in order to make a prediction.
• Like RNNs, the LSTMs have recurrent connections so that the state from
previous activations of the neuron from the previous time step is used as
context for formulating an output.
• But unlike other RNNs, the LSTM has a unique formulation that allows it to
avoid the problems that prevent the training and scaling of other RNNs.
• LSTM overcomes the problems like vanishing gradient and exploding
gradients.
LSTM Gates
• Forget Gate: Decides what information to discard from the cell.
• Input Gate: Decides which values from the input to update the memory
state.
• Output Gate: Decides what to output based on input and the memory of the
cell. ©Dr. SUDHEER DEVULAPALLI 40
• The forget gate and input gate are used in the updating of the internal state.
• The output gate is a final limiter on what the cell actually outputs.
• It is these gates and the consistent data flow called the constant error
carrousel or CEC that keep each cell stable (neither exploding or vanishing).

Applications of LSTM:
• Image caption generation.
• Text translation.
• Hand writing recognition.
Limitations of LSSTM:
• In time series forecasting, often the information relevant for making a
forecast is within a small window of past observations. Often an MLP with a
window or a linear model may be a less complex and more suitable model.
• An important limitation of LSTMs is the memory

©Dr. SUDHEER DEVULAPALLI 41

You might also like