Deep Learning
1-2 Logistics, Software, Computational Graphs
EE-433/AI-511, UET Lahore, Pakistan
Dr. Ahsen Tahir
.The slides in part have been modified from Ian Good Fellow book slides and Dive in to Deep Learning slides
Goals
• Introduction to Deep Learning
(MLP, optimization, convolutions, sequences)
• Theory
• Capacity control (weight decay, dropout, batch norm)
• Optimization, models, overfitting, objective functions
• Practice
• Write code in Python / Pytorch
• Solve realistic problems
• Complex Engineering Problem
• Ability to solve original problems in Deep Learning in a team
EE-433/AI-511
Getting there
• Course
• “Dive in to Deep learning” book online
[Link]
• “Deep Learning” book by Ian GoodFellow et al. online
[Link]
• Dive into Deep Learning
• Jupyter Notebooks
• Github repository at d2l-ai/d2l-en
EE-433/AI-511
Logistics
Contacts
• Lecturers
• Ahsen Tahir
Office hours: TBA
• Email ahsan@[Link]
• Teaching Support
• Anique Aslam
Office hours: TBA
• Email maniqueaslam@[Link]
EE-433/AI-511
Homework
• 5 assignments + 1 CEP BS/Project MS
• Due 1 week after posted
2/12, 2/26, 3/12, 4/2, 4/19, CEP at the end
• Best 4 out of 5 homeworks count
• Code plagiarism from each other or online ->6 months rustication
• No mark for late submission
• programming assignments
EE-433/AI-511
Homework
• Submit homework via GitHub
• Submit the homework by 12am it’s due
• pulled request after deadline
• Submit as Jupyter notebooks (code)
• Commited annotated feedback via Git
• Logistics
• Github account & repository (email to course)
• Permission for teacher to read/write the repository
EE-433/AI-511
Complex Engineering Problem (CEP) / Project
• Original work in machine learning
• Existing tools applied to novel problem
• Novel tools
• Research ‘with training wheels’ simulates academic process
• Research in a team (4 students BS/ 1 student MS)
• Deliverables with schedule / deadlines
• End result is a paper/report/presentation (NIPS template)
EE-433/AI-511
Complex Engineering Problem (CEP)
• 2/5 Register team (names, working title)
• 3/5 Project proposal (1-2 page, 5 min talk)
• 4/21-22 (or earlier) Talk to Teacher to discuss
• Final presentation & report
(6-20 pages report, 6-20 slides talk)
• Start early (last minute projects fail often)
• No, you cannot do it alone. This is teamwork.
EE-433/AI-511
Deep Learning
SIFT - DAVID LOWE
MOST CELEBRATED ALGORITHM FOR OBJECT (OVERLAPS YELLOW AND GREEN)
E
DETECTION/RECOGNITION, MAPPING, TRACKING 10-13 YEARS AGO
E T
O L
B S
O
THE FUTURE OF COMPUTER VISION
BELONGS TO THE FEATURE LEARNING
DAVID LOWE
Classify Images
[Link]
EE-433/AI-511
Classify Images
[Link]
Yanofsky, Quartz
[Link]
the-direction-of-ai-research-and-possibly-the-
world/
COMPUTER VISION WITH DEEP LEARNING
Convolutional neural networks for computer vision
Object Detection (Yolo-Lite) Image Segmentation (Yolo-Lite)
Detect and Segment Objects
[Link]
EE-433/AI-511
Style transfer
[Link]
EE-433/AI-511
Synthesize Faces
Karras et al, ICLR 2018
EE-433/AI-511
Analogies
[Link]
EE-433/AI-511
Machine Translation
[Link]
Image captioning
Shallue et al, 2016
[Link]
[Link]
Software
Tools [Link]
• Python
• Everyone is using it in machine learning & data science
• Conda package manager (for simplicity)
• Jupyter
• So much easier to keep track of your experiments
• Obviously you should put longer code into modules
• Reveal (for notebook slides)
conda install -c conda-forge rise
• pytorch
• Scalability & ease of use
• Imperative interface
EE-433/AI-503
Laptop / Desktop / Generic Cloud with Linux
• Conda
wget [Link]
sh Miniconda3-latest-Linux-x86_64.sh
mkdir d2l-en
cd d2l-en
curl [Link] -o [Link]
unzip [Link]
rm [Link]
• Install pytorch
• Install NVIDIA drivers / CUDA / CUDNN / TensorRT
Colab
• Go to [Link]
• Activate the GPU supported runtime
• Install d2l
# pytorch should already be installed
!pip install d2l
EE-433/AI-503
Disclaimer
• This course will not discuss basics of python, numpy
and/or pytorch tensors
• The course assumes you have sufficient programming
experience. You know the basics of machine learning including
working of ANN/Perceptron, basic learning algorithm etc.
• The course may give a review of few topics.
EE-433/AI-503
The Learning Problem
Supervised Learning
Given:
[object label]
Questions to answer:
Gradient-Based Learning
Specify
• Model
• Cost
• Design model and cost so cost is smooth
• Minimize cost using gradient descent or related
techniques
Conditional Distributions and Cross Entropy
Learning Problem
Given:
Predict… Based on…
category of object image
sentence in French sentence in English
presence of disease X-ray image
text of a phrase audio utterance
Learning Problem
Probability makes more sense than predicting discrete labels
It is also easier to learn, due to smoothness
Intuitively, we can’t change a discrete label “a tiny bit,”
it’s all or nothing
But we can change a probability “a tiny bit”
Given:
Learning Problem
probability distribution
over photos
~
conditional probability
distribution over labels
Learning Problem
Training set:
Learning Problem
Learning Problem
maximum likelihood
estimation (MLE)
negative log-likelihood (NLL)
this is our loss function!
Conditional Distributions and Cross Entropy
Computation Graphs
Computation Graphs
Computation Graph: NN Loss Function
Computation Graphs in pytorch
Computation Graphs in pytorch
Gradients, Jacobian and
Chain Rule
Gradient
A scalar function f (x1, x2, x3) that is defined and differentiable in a domain in 3D-space with
Cartesian coordinates x1, x2, x3. We denote the gradient of that function by grad f or f (read nabla f ).
Then the gradient of f(x1, x2, x3) is defined as the vector function*.
EE-433/AI-511 *Advanced Engineering Mathematics - Kreyszig
Gradient
A vector function y = f (x) that is defined and differentiable in a domain in 1D-space with
Cartesian coordinate x. We denote the gradient of that function by grad f or f (read nabla f ).
Then the gradient of f is defined as the vector function*.
EE-433/AI-511
∂y/∂x x
x
∂y1
∂y ∂y
y1 ∂x y
∂x ∂x
∂y2
y2 ∂y
y= = ∂x y ∂y ∂y
⋮ ∂x ⋮ ∂x ∂x
ym ∂ym
∂x
∂y/∂xis a row vector, while ∂y/∂x is a column vector
It is called numerator-layout notation. The reversed version is
called denominator-layout notation
Jacobian
A vector valued f (x1, x2, x3) that is defined and differentiable in a domain in 3D-space with
Cartesian coordinates x1, x2, x3. We denote the Jacobian of that function as:
EE-433/AI-511
∂y/∂x x1 y1 x
x
x2 y2
x= y= ∂y ∂y
⋮ ⋮ y
∂x ∂x
xn ym
y ∂y ∂y
∂x ∂x
∂y1 ∂y1 ∂y1 ∂y1
,
∂x1 ∂x2
, …,∂x
∂x n
∂y2 ∂y2 ∂y2 ∂y2
∂y , , …,∂x
= ∂x = ∂x1 ∂x2 n
∂x ⋮ ⋮
∂ym ∂ym ∂ym ∂ym
∂x ∂x1
, ∂x , …, ∂x
2 n
Examples
n m ∂y m×n
y a x Ax T
xA x ∈ ℝ, y ∈ ℝ , ∈ℝ
∂x
a, a and A are not functions of x
∂y
0 I A AT
∂x 0 and I are matrices
y au Au u+v
∂y ∂u ∂u ∂u ∂v
a A +
∂x ∂x ∂x ∂x ∂x
Generalize to Matrices
Scalar Vector Matrix
x (1,) x (n,1) X (n, k)
∂y ∂y ∂y
Scalar y (1,) (1,) (1,n) (k, n)
∂x ∂x ∂X
∂y ∂y
Vector y (m,1) (m,1) (m, n) ∂y (m, k, n)
∂x ∂x
∂X
Matrix ∂Y ∂Y (m, l, n) ∂Y
Y (m, l ) (m, l ) (m, l, k, n)
∂x ∂x ∂X
[Link]/berkeley-stat-157
Chain Rule
EE-433/AI-511 *Advanced Engineering Mathematics - Kreyszig
Chain Rule
EE-433/AI-511
Chain Rule
What is ?
EE-433/AI-511
Chain Rule for higher dimensional tensors
EE-433/AI-511
Jacobian-vector product example
def f(x1, x2): def g(y1, y2):
a = x1 * x2 return y1 * y2
y1 = log(a)
y2 = sin(x2)
return (y1, y2)
EE-433/AI-511
Jacobian-vector product – pytorch uses chain rule
def f(x1, x2):
a = x1 * x2
y1 = log(a)
y2 = sin(x2)
return (y1, y2)
def g(y1, y2):
return y1 * y2
EE-433/AI-511
Jacobian-vector product – pytorch uses chain rule
EE-433/AI-511
Thank you