PGDDSA Computer Vision Session 8&9 HO

AI
PGDME
Artificial Intelligence (AI)

Session #8 & 9
Computer Vision
Fundamentals of AI, Audio & Computer Vision, Latest Advances….
Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 1 Krishna Durbha
AI
Session Objectives
PGDME
¡ Speech recognition
¡ Case Study audio content cleansing
¡ Computer Vision
¡ CNN, RNN etc.
Artificial Intelligence 2 Krishna Durbha
AI
Speech to text
PGDME
Break speech into tiny, recognizable parts

called phonemes (only 44 in English)
Order, combination & context of these

phonemes allows sophisticated audio analysis
tools to identify what was said.
Linguistic component analyzes all the

preceding words and relationship to estimate
probability of next word using hidden markov
models. It converts sequence of acoustic units
into words, phrases, paragraphs. It can handle
words with same sound, but mean different
things, eg. peace & piece.
Software matches with text that best match the

words you spoke. Software has acoustic
models to identify patterns of speech waves.
Artificial Intelligence 3 Krishna Durbha dd/Mmm/2020
AI
Tools
PGDME
http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
www.Shazam.com
“Focus on a few “"intense" moments in a song.

Create a spectrogram to plot 3 dimensions of
music: frequency vs. amplitude vs. time. Algorithm
picks points at peak of graph ie. "higher energy
content" than other notes around it”.
AI
2. Computer Vision
PGDME
Artificial Intelligence 5 Krishna Durbha 14/Nov/2019
AI
Deep learning algorithms

PGDME
Convolutional neural networks (CNN) often used Random Forests, (Random Decision
for machine vision. Forests), are not neural networks, but
used for classification, regression.
Recurrent neural networks (RNN) used for
natural language, other sequence processing
Long Short-Term Memory (LSTM) networks

and attention-based neural networks
The breakthrough in the neural network
field for vision was Yann LeCun’s
…well-suited to classifying, processing & making 1998 LeNet-5, a seven-level convolutional
predictions on time series data, as lags of neural network (CNN) for recognition of
unknown duration between two events. handwritten digits digitized in 32x32 pixel
LSTMs are “fancy RNNs” & have a cell state. images.
AI
RNN, LSTM, attention-based NNs
PGDME
§ RNN: info loops, so network

remember recent outputs.
Sequences, time series analysis
Feed forward networks: Info
is possible.
flows from input, through
§ 2 issues:
hidden layers, to output.
o exploding gradients (fixed by
Deals with a single state at a
clamping gradients)
time so no “memory”.
o vanishing gradients (not so
easy to fix).
§ LSTMs, network can forget (gating) previous Random Forests (Deep Learning but not deep neural).
info or remember it, by altering weights. § Has many layers, of decision trees (not neurons)
§ LSTMs have long-term, short-term memory & § Output is average (mode for classification / mean for
solve vanishing gradient problem. regression) of individual tree prediction.
§ LSTMs can handle hundreds of past inputs. § Uses bootstrap aggregation (bagging) for individual
trees & taking random subsets of features.
Attention module gates apply weights to input

vectors. Hierarchical neural attention encoder
uses multiple layers of attention modules to
handle thousands of past inputs.
AI
CNN Feature Map

PGDME
Feature Detector / Feature / Activation

Input Image Filter Map
Eg: Sharpen Eg: Blur Eg: Edge Enhance
Try out the online feature map at https://docs.gimp.org/2.6/en/plug-in-convmatrix.html

AI
ReLU layers
PGDME
ReLU
Removes linearity in outputs since real

images do not have sharp transitions
between black & white or straight edges.
AI
Convolution & Max Pooling

PGDME
Convolution (Sliding)
(W−F+2P)/S+1
3 Channels RGB
Max pooling
1 1 2 4
5 6 7 8 Max pool 6 8
2x2 Filters &
3 2 1 0 Stride 2 3 4
1 2 3 4
http://cs231n.github.io/convolutional-networks/
AI
Softmax activation function

PGDME
AI
CNN for a Self Driving Car

PGDME
Loss layer computes how network penalizes

deviation between predicted and true
labels, using :
§ Softmax or cross-entropy loss function for
classification, or
§ Euclidean loss function for regression.
Convolutional layer takes

integrals of many small
overlapping regions. ReLU layers apply non- Fully connected
saturating activation layer neurons
Pooling layer function f(x) = max(0,x). connections to all
performs non-linear activations in the
down sampling. previous layer.
Thank You
Krishna.Durbha@sbm.nmims.edu
Artificial Intelligence 13 Krishna Durbha dd/mm/yy

PGDDSA Computer Vision Session 8&9 HO

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PGDDSA Computer Vision Session 8&9 HO

Uploaded by

Copyright:

Available Formats

AI

Artificial Intelligence (AI)

Fundamentals of AI, Audio & Computer Vision, Latest Advances….

Break speech into tiny, recognizable parts

Order, combination & context of these

Linguistic component analyzes all the

Software matches with text that best match the

“Focus on a few “"intense" moments in a song.

Deep learning algorithms

Long Short-Term Memory (LSTM) networks

§ RNN: info loops, so network

Attention module gates apply weights to input

CNN Feature Map

Feature Detector / Feature / Activation

Eg: Sharpen Eg: Blur Eg: Edge Enhance

Try out the online feature map at https://docs.gimp.org/2.6/en/plug-in-convmatrix.html

Removes linearity in outputs since real

Convolution & Max Pooling

Softmax activation function

CNN for a Self Driving Car

Loss layer computes how network penalizes

Convolutional layer takes

You might also like