You are on page 1of 13

AI

PGDME

Artificial Intelligence (AI)


Session #8 & 9
Computer Vision

Fundamentals of AI, Audio & Computer Vision, Latest Advances….

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 1 Krishna Durbha
AI

Session Objectives
PGDME

¡ Speech recognition
¡ Case Study audio content cleansing
¡ Computer Vision
¡ CNN, RNN etc.

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 2 Krishna Durbha
AI

Speech to text
PGDME

Break speech into tiny, recognizable parts


called phonemes (only 44 in English)

Order, combination & context of these


phonemes allows sophisticated audio analysis
tools to identify what was said.

Linguistic component analyzes all the


preceding words and relationship to estimate
probability of next word using hidden markov
models. It converts sequence of acoustic units
into words, phrases, paragraphs. It can handle
words with same sound, but mean different
things, eg. peace & piece.

Software matches with text that best match the


words you spoke. Software has acoustic
models to identify patterns of speech waves.

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 3 Krishna Durbha dd/Mmm/2020
AI

Tools
PGDME

http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

www.Shazam.com

“Focus on a few “"intense" moments in a song.


Create a spectrogram to plot 3 dimensions of
music: frequency vs. amplitude vs. time. Algorithm
picks points at peak of graph ie. "higher energy
content" than other notes around it”.

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 4 Krishna Durbha dd/Mmm/2020
AI
2. Computer Vision
PGDME

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 5 Krishna Durbha 14/Nov/2019
AI

Deep learning algorithms


PGDME

Convolutional neural networks (CNN) often used Random Forests, (Random Decision
for machine vision. Forests), are not neural networks, but
used for classification, regression.
Recurrent neural networks (RNN) used for
natural language, other sequence processing

Long Short-Term Memory (LSTM) networks


and attention-based neural networks
The breakthrough in the neural network
field for vision was Yann LeCun’s
…well-suited to classifying, processing & making 1998 LeNet-5, a seven-level convolutional
predictions on time series data, as lags of neural network (CNN) for recognition of
unknown duration between two events. handwritten digits digitized in 32x32 pixel
LSTMs are “fancy RNNs” & have a cell state. images.

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 6 Krishna Durbha 14/Nov/2019
AI
RNN, LSTM, attention-based NNs
PGDME

§ RNN: info loops, so network


remember recent outputs.
Sequences, time series analysis
Feed forward networks: Info
is possible.
flows from input, through
§ 2 issues:
hidden layers, to output.
o exploding gradients (fixed by
Deals with a single state at a
clamping gradients)
time so no “memory”.
o vanishing gradients (not so
easy to fix).

§ LSTMs, network can forget (gating) previous Random Forests (Deep Learning but not deep neural).
info or remember it, by altering weights. § Has many layers, of decision trees (not neurons)
§ LSTMs have long-term, short-term memory & § Output is average (mode for classification / mean for
solve vanishing gradient problem. regression) of individual tree prediction.
§ LSTMs can handle hundreds of past inputs. § Uses bootstrap aggregation (bagging) for individual
trees & taking random subsets of features.

Attention module gates apply weights to input


vectors. Hierarchical neural attention encoder
uses multiple layers of attention modules to
handle thousands of past inputs.
Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 7 Krishna Durbha 14/Nov/2019
AI

CNN Feature Map


PGDME

Feature Detector / Feature / Activation


Input Image Filter Map

Eg: Sharpen Eg: Blur Eg: Edge Enhance

Try out the online feature map at https://docs.gimp.org/2.6/en/plug-in-convmatrix.html


Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 8 Krishna Durbha dd/Mmm/2020
AI

ReLU layers
PGDME

ReLU

Removes linearity in outputs since real


images do not have sharp transitions
between black & white or straight edges.

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 9 Krishna Durbha dd/Mmm/2020
AI

Convolution & Max Pooling


PGDME

Convolution (Sliding)

(W−F+2P)/S+1

3 Channels RGB

Max pooling
1 1 2 4
5 6 7 8 Max pool 6 8
2x2 Filters &
3 2 1 0 Stride 2 3 4
1 2 3 4
http://cs231n.github.io/convolutional-networks/
Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 10 Krishna Durbha dd/Mmm/2020
AI

Softmax activation function


PGDME

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 11 Krishna Durbha dd/Mmm/2020
AI

CNN for a Self Driving Car


PGDME

Loss layer computes how network penalizes


deviation between predicted and true
labels, using :
§ Softmax or cross-entropy loss function for
classification, or
§ Euclidean loss function for regression.

Convolutional layer takes


integrals of many small
overlapping regions. ReLU layers apply non- Fully connected
saturating activation layer neurons
Pooling layer function f(x) = max(0,x). connections to all
performs non-linear activations in the
down sampling. previous layer.

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 12 Krishna Durbha 14/Nov/2019
Thank You
Krishna.Durbha@sbm.nmims.edu

Only for your study reference. Not to be circulated as soft / hard copy
Artificial Intelligence 13 Krishna Durbha dd/mm/yy

You might also like