You are on page 1of 35

Presentation by UFMFEV-30-M

Dr. Wenhao Zhang


Core AI
Senior Lecturer in
Machine Vision

Week 8: Machine Learning

22/03/2021
What is machine learning?

Algorithms and statistical models that can perform a


task without using explicit instructions and that
can improve their performance using training data

The history of machine learning is not in the scope of this lecture


The traditional five stages of computer vision
light

Scene Image Pre-


Segmentation
constraints acquisition processing

Interaction
Classification Feature
/interpretation extraction
What is machine learning
Images of bananas Images of oranges

human experts design rules

elongated? spherical?

no Model
yes yes
others Query image
bananas oranges
Orange!
Heuristics based Learning based
Why machine learning
• Machine learning is often considered when it is very
challenging for human experts to derive explicit
instructions.
Why machine learning
• Examples:
o Face recognition (note that this is different to face detection)
̶ What features to use?
o Email Spam and Malware Filtering
̶ A large list of rules
o Disease diagnosis
̶ Lung cancer, for example (widened
mediastinum? Reduced vascularity? …)
Types of machine learning
• Supervised learning Whether or not trained with
• Unsupervised learning human labelled data

• Semi-supervised learning ()
• Reinforcement learning (keywords: agent, policy, reward, penalty…)

Supervised vs unsupervised learning
• Training data and labels are provided – supervised learning
• Only training data (but not labels) are provided – unsupervised learning

Feature 1
Feature 1 (e.g. colour)

Feature 2
Feature 2 (e.g. shape)
Unsupervised learning Supervised learning Class 1
Class 2
Unknown
Supervised learning
• K Nearest Neighbours (KNN)
• Artificial Neural Network (Multilayer Perceptron)

Specific unsupervised learning techniques are not introduced in this lecture


KNN
o Non-parametric method used for classification and regression
o Only the classification scenario is discussed in this lecture
o A test sample is assigned to the class most common among its K
nearest neighbours, where K is a positive integer value.

Feature 1

Class 1
Class 2
Feature 2 Unknown
Limitations of KNN
• Large dataset and/or high dimensionality limit efficiency

• “Curse of dimensionality”

• Highly dependent on data quality/features


o Outliers
o Different scales
MLP
• A type of feedforward Artificial Neural Network (ANN)
• “Vanilla" neural networks
• Consisting of an input and an output layer with one or more hidden layers

• For simplicity, there is only one hidden layer in this example


• MLP is fully connected

 Hidden layer
𝑊
  𝑗 ,𝑘
 Output layer
flattens  Input

𝑊
  𝑖, 𝑗
MLP
• An example – forward path

 𝑥1
4
 𝑎1
 𝑥2 8
 
4 8 = 0.5
= 1.0 24  = 0.2
0 1  𝑥3 0 = 7.0
a2 = 2.0 ???
 𝑥 4 = 4.0
0 5 1 = 0.5
a3  Output layer
 𝑥5 0

 𝑥6 5
24*0.2 + a *2.0 + a *0.5
MLP
• What happens next?
o Update weights until the difference between the output and the label
(i.e. the ground truth) is zero (or small enough), or does not decrease.
o A loss function is used to measure this difference. Training a model is
the process of minimising this loss function (Gradient descent,
Backpropagation)

Images
INPUT
Labels
MLP
• What happens next?
o When the weights (hyper-parameters) are fixed, the model is trained
o For each query image, use these hyper-parameters to perform forward
path calculation to determine the output (i.e. prediction)
Adding e.g.
nonlinear function
nonlinearity
• Linear models may struggle to represent complex problems
  X)
• Adding nonlinearities

activation function
𝑊
  𝑗 ,𝑘  Hidden layer

 Output layer
f
 Input

𝑊
  𝑖, 𝑗
Question
• What does it mean when the weights represented by red lines have zero values?

𝑊
  𝑘, 𝑗  Hidden layer

 Output layer  
a1 =0
 Input
a2
=0
=0
a3
x5 𝑊
  𝑗 ,𝑖

 does not contribute to the results


Convolutional neural networks (CNN)
• Convolution 0 1 3 1 1 1
2 200 5  ⊗ 1 1 1 26
7 10 4 1 1 1

Local image data kernel Modified image data

• e.g. the Sobel operator


Convolutional neural networks (CNN)
• Convolution 0 1 3 1 1 1
2 200 5  ⊗ 1 1 1 26
7 10 4 1 1 1

local image data kernel features

• Learning of weights
0 1 3 ? ? ?
2 200 5
 ⊗ ? ? ? …
7 10 4 ? ? ?

local image data kernel


ReLu
Layers in a CNN
1. Convolutional layers
2. Nonlinear layers/Activation layers

CONV
  4
Nonlinear activation
𝑎  × 𝑎
  e.g.

a ≠ b?
Refer to the convolution lecture, hint: use of padding
a = b?
Layers in a CNN

CONV
Nonlinear activation

another set of filters (6)

  6
𝑎  × 𝑎   4

  Note that the size of each filter is 4 (n < b)


Feature extraction in CNN


1. Convolutional layers
2. Nonlinear layers/Activation layers
Layers in a CNN
3. Pooling layers: progressively reduce the spatial size of the representation to
reduce the amount of parameters and computation in the network

Commonly used after convolutional layers

1 3 2 7
6 2 6 5 6 7

1 5 7 2 8 9

4 8 5 9
e.g. max pooling with a 2×2 window and a stride of 2
Layers in a CNN
3. Pooling layers
• smaller number of parameters
• does not affect depth

Pooling

𝑎  × 𝑎   4   6   6
𝑣𝑒𝑐𝑡𝑜𝑟
  𝑙𝑒𝑛𝑔𝑡h: 6 ∗𝑑 2
Layers in a CNN A high parameter count

4. Fully connected layers

𝑎  × 𝑎   4   6   6
Feature extraction and
classification in a CNN

𝑎  × 𝑎   4   6   6

FEATURE EXTRACTION CLASSIFICATION


Feature extraction + classification
Previously with a MLP

Features:
• Raw pixel values
• More commonly seen:
handcrafted features

weights updating…

Now with a CNN

Features:
• Automatically
extracted by filters
• End-to-end
e.g. the VGG architecture
Applications

Deep learning
Eye centre localisation
• Eye tracking
• Human-computer interaction
• Psychology studies and medical applications
Directed Advertising
Eye centre localisation

Gaze gesture recognition

Attention monitoring

Gender group classification

Age group classification

3D face reconstruction

Shown in the video a (2D+3D) face being automatically classified as a YOUNG MALE. Real-time eye tracking allows the user to issue gaze gestures to
interact with the system. Personalised advertisements are being displayed, which are also being manipulated by gaze gestures
Eye morphology
• Gradient based voting

Zhang, W., Smith, M.L., Smith, L.N. and Farooq, A., 2016. Gender and gaze gesture recognition for
human-computer interaction. Computer Vision and Image Understanding, 149, pp.32-50.
Eye saccade analysis for dementia diagnosis
  Input:   Output:
(Image data)
(eye coordinates) (predicted coordinates)

𝑐
  𝑥𝑙
𝑐  𝑦𝑙
 𝑐 𝑥𝑟


𝑐  𝑦𝑟


Zhang, W. and Smith, M., 2019, July. Eye Centre Localisation with Convolutional Neural Network Based Regression.
In 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC) (pp. 88-94). IEEE.
Visualising features/filters
• Not all pixels make an equal contribution

• A different filter learns something different


Handcrafted features vs.
automatically extracted features
handcrafted features: automatically extracted features:

 Can have physical meanings (e.g. face  Data driven and learning based, therefore tailed to
landmarks) a specific problem
 Can be explicitly modelled  Do not require human experts to extract a given
 Often do not require a large training set set of carefully chosen characteristics
 Intuitive to visualise and analyse features  Generate multiple levels of representation (e.g.
 Dimensionality are often lower compared high-level and low-level) at the same time
to automatically extracted features  Often generated by a single model rather than
following a multi-step process

You might also like