You are on page 1of 552

COMPUTER VISION

LECTURE I
INTRODUCTION AND OVERVIEW

DR. GEORGE KARRAZ, Ph. D.


This lecture is an overview of the ideas and
techniques to be covered during the course.

DR. GEORGE KARRAZ, Ph. D. 2


Outlines
Computer Vision
• Definition
• Branches
• Between Various Related Fields..
• Commuter Vision Requisites
• Applications
Course Topics
Examples of Completed Works
References
DR. GEORGE KARRAZ, Ph. D. 3
Computer Vision Definition
Computer vision is a field of artificial intelligence that
trains computers to interpret and understand the visual
world. Using digital images from cameras and videos
and deep learning models, machines can accurately
identify and classify objects and then react to what
they “see” .

Computer Vision
Artificial Intelligence
Machine Learning

4
DR. GEORGE KARRAZ, Ph. D.
Computer Vision Branches
Sub-domains of computer vision include:
• Scene Reconstruction.
• Object Detection.
• Event Detection.
• Video Tracking.
• Object Recognition.
• Motion Estimation.
• 3D scene modeling.
• image restoration.

DR. GEORGE KARRAZ, Ph. D. 5


Computer Vision between Various
Related Fields..

6
DR. GEORGE KARRAZ, Ph. D.
Commuter Vision Requisites

Good understanding of:

• Linear algebra,
• Probability Theory and Statistics.
• Digital Image Processing (OpenCV or MATLAB).
• Digital Signal Processing (MATLAB ).
• Programming language (Python or C++,..)

DR. GEORGE KARRAZ, Ph. D. 7


Computer Vision Applications
• Self-driving cars ( in 2023 autonomous vehicles).
• Medical images automatic analysis and classification (X-
ray, CT-Scan, MRI, PET, …etc.), (i.e. cancer detection).
• Mineral and Oil exploration.
• Space science.
• Surveillance cameras.
• Pedestrian detection.
• Parking occupancy detection.
• Traffic flow analysis and road condition monitoring.
• ….

DR. GEORGE KARRAZ, Ph. D. 8


Course Topics
1. Introduction
2. Digital Signals Processing and Analysis
3. Digital Images.
4. Digital Image Processing and Analysis
5. Edge & Structure Extraction
6. Local Image Features
7. Geometric transformations
8. Face Detection and Recognition
9. Videos
10. Motion tracking & Estimation
DR. GEORGE KARRAZ, Ph. D.9
Examples of Completed Works
It is important in this lecture as introduction to computer
Vision, to give you a set of real modern related examples
which were completed during the last years, arranged from
newest to oldest (feel free to follow the link of each research
for more information)
1- George Karraz, “An Intelligent System to Analyze the Functional
Magnetic Resonance Imaging fMRI”, International Journal of Innovative
Science and Research Technology, 2023, 8 (1):1672-1681.
DOI: https://zenodo.org/record/7814289
2- Rawan Abo Zidan and George Karraz, “Gaussian Pyramid for Nonlinear
Support Vector Machine”, Applied Computational Intelligence and Soft
Computing, Hindawi, Article ID 5255346,2022, pp. 1-9.
DOI: https://www.hindawi.com/journals/acisc/2022/5255346/
DR. GEORGE KARRAZ, Ph. D. 10
Examples of Completed Works, cont..
3- George Karraz, “Effect of adaptive line enhancement filters on noise
cancellation in ECG signals”, Serbian Journal of Electrical Engineering,
2021, 18(3):291:302.
DOI: http://www.doiserbia.nb.rs/Article.aspx?ID=1451-48692103291K
4- Haneen Shhadeh.Wesam Bachir and George Karraz, “A Sensitive Fibre
Optic Probe for Autofluorescence Spectroscopy of Oral Tongue Cancer:
Monte Carlo Simulation Study”, BioMed Research International,
Hindawi, 2020, Article ID 1936570, pp.1-11
DOI: https://doi.org/10.1155/2020/1936570DO
5- Kawthar M. K. Alghourani, Wesam Bachir and George Karraz, Effect of
Absorption and Scattering on Fluorescence of Buried Tumours, Journal
of Spectroscopy, Hindawi, 2020, Article ID 8730471, pp.1-7
DOI: https://doi.org/10.1155/2020/8730471

DR. GEORGE KARRAZ, Ph. D. 11


Examples of Completed Works, cont..
6- George Karraz, A Novel Technique to Predict and Detect Lung Cancer
in the Computerized Tomography Images, Al-Bath University Journal,
Syria, 2019, 41(26):135-155
DOI: https://zenodo.org/record/7814289
7-George Karraz, Sonia Jalgha and Ali Aji, “Estimation of Porosity using
Artificial Neural Networks in Sazaba Oil Fields”, Journal of Basic
Science, Damascus University, Syria, 2019, 61(2):422-436
DOI: https://zenodo.org/record/7698234
8- George Karraz, Intelligent System to Reduce Size and Time of Video
Display, Tishreen University Journal for Research and Scientific Studies -
Engineering Sciences Series, Syria, 2018, 40 (4):31-47.
DOI: https://zenodo.org/record/7698443

DR. GEORGE KARRAZ, Ph. D. 12


References

Textbooks
D. Forsyth, J. Ponce
Computer Vision – A Modern Approach
Prentice Hall, 2002

R. Hartley, A. Zisserman
Multiple View Geometry in Computer Vision
2nd Ed., Cambridge Univ. Press, 2004

DR. GEORGE KARRAZ, Ph. D. 13


Through this course:
I focused on general computer vision
techniques and methodologies that have been
useful in many applications.

DR. GEORGE KARRAZ, Ph. D.


14
THANK YOU!

NEXT: DIGITAL SIGNALS PROCESSING & ANALYSIS

DR. GEORGE KARRAZ, Ph. D.

15
COMPUTER VISION
LECTURE II
DIGITAL SIGNALS PROCESSING & ANALYSIS

DR. GEORGE KARRAZ, Ph. D.


Contents:
1. Waveforms and Sampling Theorem.
2. Digital Signal (Audio) Processing.

DR. GEORGE KARRAZ, Ph. D.

2
Waveforms and Sampling Theorem:
• Frequency is the number of cycles per
second and is measured in Hertz (Hz)
• Wavelength is inversely proportional to
frequency i.e. Wavelength varies as Simple Waveforms
1/frequency
• The general form of the sine is as follows:
Y= A sin(2Pi *n* Fw / Fs)
Fs: is the sample frequency, n: is the
sample index.
• Fs must be ≥ 2* max(Fw) (Nyquist
Theorem)

3 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• The Decibel (dB) When referring to measurements of power or


intensity, we express these in decibels (dB):
• XdB=10 log10(X/X0)
• X: is the actual value of the quantity being measured.
X0: is a specified or implied reference level.
XdB: is the quantity expressed in units of decibels, relative to X0.
• X and X0 must have the same dimensions, they must measure the
same type of quantity in the same units.
• The reference level itself is always at 0 dB, as shown by setting X =
X0 (note: log10 (1) = 0).

4 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Why Use Decibel Scales?


1 - When there is a large range in frequency or magnitude, logarithm
units often used.
2 - If X is greater than X0 then XdB is positive (Power Increase)
3- If X is less than X0 then XdB is negative (Power Decrease).
4- Power Magnitude = 𝑋(𝑖) 2 so (with respect to reference level)

XdB=10 log10( 𝑋(𝑖) 2 )= 20 log10( 𝑋(𝑖) ) which is an expression of dB we


often come across.

5 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:
• Why Use Decibel Scales?
5- dB is commonly used to quantify sound levels relative to some 0 dB reference.
6- The reference level is typically set at the threshold of human perception.
7- Human ear is capable of detecting a very large range of sound pressures.
8- The ratio of sound pressure that causes permanent damage from short exposure
to the limit that (undamaged) ears can hear is above a million, so 120 dB is the
quote threshold of pain for humans.
9- Maximum human sensitivity at noise levels at between 2 and 4 kHz (Speech)

6 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal to Noise: Signal-to-noise ratio is a term for the power ratio


between a signal (meaningful information) and the background noise:
2
𝑃𝑠𝑖𝑔𝑛𝑎𝑙 𝐴𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅 = =
𝑃𝑛𝑜𝑖𝑠𝑒 𝐴𝑛𝑜𝑖𝑠𝑒 2
𝐴𝑠𝑖𝑔𝑛𝑎𝑙
𝑆𝑁𝑅𝑑𝐵 = 20 log
𝐴𝑛𝑜𝑖𝑠𝑒
Both signal and noise power (or amplitude) must be measured at the same
or equivalent points in a system, and within the same system bandwidth.

7 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Algorithms and Signal Flow Graphs: It is common to represent digital


system signal processing routines as a visual signal flow graphs. We use a
simple equation relation to describe the algorithm.
Three Basic Building Blocks
We will need to consider three processes:
1- Delay
2- Multiplication
3- Summation

8 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal Flow Graphs (Delay):

9 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal Flow Graphs (Delay):

10 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal Flow Graphs


(Multiplication):

11 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal Flow Graphs


(Summation):

12 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal Flow Graphs


(Summation):

13 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal Flow Graphs: We can combine all above algorithms to build up


more complex algorithms.

14 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

• Signal Flow Graphs: We can combine all above algorithms to build up


more complex algorithms.

15 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

16 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

17 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

18 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

19 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

20 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

21 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

22 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

23 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

24 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

25 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

26 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

27 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

28 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

29 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

30 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

31 DR. GEORGE KARRAZ, Ph. D.


Digital Signal Processing:

32 DR. GEORGE KARRAZ, Ph. D.


THANK YOU!

NEXT: IMAGES

DR. GEORGE KARRAZ, Ph. D.

33
COMPUTER VISION
LECTURE III
IMAGES

DR. GEORGE KARRAZ, Ph. D.


Outline

• Still Images
• Vector Drawing
• Bitmaps
• Popular File Formats

2 DR. GEORGE KARRAZ, Ph. D.


Still Images

 Still images are generated by the computer in two ways :


 bitmaps (paint graphics, raster images).
 Vector-drawn graphics.
 Bitmaps
 photo-realistic images, complex drawings ,fine detail
 Vector-drawn objects
 lines, boxes, circles, polygons, and other graphic
shapes,
 mathematically expressed in angles, coordinates, and distances.

3 DR. GEORGE KARRAZ, Ph. D.


Vector Drawing
• vector-drawn objects → lines, rectangles, ovals, polygons,
complex drawings created from those objects, and text.
• Computer-aided design (CAD) (like AutoCAD, …)
• Used vector-drawn object systems for creating the highly
complex and geometric renderings needed by architects and
engineers.

• Graphic artists designing for print media use vector-drawn


objects

• Programs for 3-D animation also use vector-drawn graphics.


• Mathematical mapping for rotation, translation,…

4 DR. GEORGE KARRAZ, Ph. D.


How Vector Drawing Works
• A vector is a line that is described by the location of its two
endpoints.
• Vector drawing uses Cartesian coordinates (x, y, z).
• <object , principal attributes, options>
• <line x1="0" y1="0" x2="200" y2="100"/>
• <rect x="0" y="0" width="200" height="100" fill="#FFFFFF”/>
• <circle cx="50" cy="50" r="10" fill="none" />

• Scalable Vector Graphics (.svg) file:


• SVG files can be saved in a small amount of memory and because
they are scalable without distortion
• Vector drawing tools use Bézier curves or paths to
mathematically represent a curve.→ curve with handles (points on
the path).

5 DR. GEORGE KARRAZ, Ph. D.


Vector Drawing

• Low memory
• Faster download
• Same quality for different resolutions (no pixelation).
• Refresh time lower with higher drawn objects

6 DR. GEORGE KARRAZ, Ph. D.


Bitmap vs. Vector

7 DR. GEORGE KARRAZ, Ph. D.


Bitmap in Brief
• Image are broken up into a grid recorded individually ,as
Sequence of pixel
• Works well for complex variations
• Not flexible
• Might need a lot of memory
• Problems when scaled
• Formats: BMP, GIF, JPEG, TIFF

8 DR. GEORGE KARRAZ, Ph. D.


Bitmap Images as Matrices
• Still pictures which (uncompressed) are represented as a bitmap (a grid
of pixels).

9 DR. GEORGE KARRAZ, Ph. D.


Bitmaps
• Bitmap: The two-dimensional array of pixel values that represents the
graphics/image data.

• A bit is the simplest element in the digital world: 0,1 … on, off … true, false.
• A map is a two-dimensional matrix of these bits.
• A bitmap, then, is a simple matrix of the tiny dots that form an image ,
displayed on a computer screen or printed.
• A one-dimensional matrix (1-bit depth) is used to display mono-chrome
images
• a bitmap where each bit is most commonly set to black or white.
• picture elements (pixels)
• 1-bit bitmap,
• N-bits bitmap for varying shades of color

10 DR. GEORGE KARRAZ, Ph. D.


Bitmaps Types

• Binary images: 1-bit Images

• Gray-level Images: 8-bit per pixel

• Color Images: The most common data types for graphics and image
file formats
• 24-bit true color and;
• 8-bit pseudo color.

11 DR. GEORGE KARRAZ, Ph. D.


Bitmaps

12 DR. GEORGE KARRAZ, Ph. D.


Bitmaps
• These images show the color depth of bitmaps as described
in the last Figure.
• Note that Images 4 and 6 require the same memory (same
file size), but the gray-scale image is superior. If file size
(download time) is important, you can dither GIF bitmap files
to the lowest color depth that will still provide an acceptable
image.

13 DR. GEORGE KARRAZ, Ph. D.


1-Bit Images
• Each pixel is stored as a single bit (0 or 1), so also referred to as
binary image.
• Such an image is also called a 1-bit monochrome image since it
contains no color.
• Next Fig. shows a 1-bit monochrome image.

Monochrome 1-bit Lena image


14 DR. GEORGE KARRAZ, Ph. D.
8-bit Gray-level Images
• Each pixel has a gray-value between 0 and 255. Each pixel is represented by a
single byte; e.g., a dark pixel might have a value of 10, and a bright one might be
230.
• Bitmap: The two-dimensional array of pixel values that represents the
graphics/image data.
• Image resolution refers to the number of pixels in a digital image (higher
resolution always yields better quality).
• Fairly high resolution for such an image might be 1,600 x 1,200, whereas
lower resolution might be 640 x 480.
• Each pixel is usually stored as a byte (a value between 0 to 255), so a 640 x
480 grayscale image requires 300 kB of storage (640 x 480 = 307, 200).
• Next Fig. shows grayscale image.

Grayscale image

15 DR. GEORGE KARRAZ, Ph. D.


24-bit Color Images
• In a color 24-bit image, each pixel is represented by three bytes, usually
representing RGB.
• This format supports 256 x 256 x 256 possible combined colors, or a total of
16,777,216 possible colors.
• However such flexibility does result in a storage penalty: A 640 x 480 24-bit
color image would require 921.6 kB of storage without any compression.

• An important point: many 24-bit color images are actually stored as 32-
bit images, with the extra byte of data for each pixel used to store an
alpha value representing special effect information (e.g., transparency).

• Next Figure shows the image forestfire.bmp, a 24-bit image in Microsoft


Windows BMP format. Also shown are the grayscale images for just the
Red, Green, and Blue channels, for this image.

16 DR. GEORGE KARRAZ, Ph. D.


(a) (b)

(c) (d)

Fig. 3.5: High-resolution color and separate R, G, B color channel images. (a):
Example of 24-bit color image “forestfire.bmp”. (b, c, d): R, G, and B color channels
for this image

17 DR. GEORGE KARRAZ, Ph. D.


8-bit Color Images

• Many systems can make use of 8 bits of color information (the so-
called “256 colors”) in producing a screen image.

• Such image files use the concept of a lookup table to store color
information.
• Basically, the image stores not color, but instead just a set of bytes, each of
which is actually an index into a table with 3-byte values that specify the
color for a pixel with that lookup table index.

18 DR. GEORGE KARRAZ, Ph. D.


Color Look-up Tables (LUTs)
• The idea used in 8-bit color images is to store only the index, or
code value, for each pixel. Then, e.g., if a pixel stores the value 25,
the meaning is to go to row 25 in a color look-up table (LUT).

Color LUT for 8-bit color images.

19 DR. GEORGE KARRAZ, Ph. D.


Color Look-up Tables
• A Color-picker consists of an array of fairly large blocks of color (or a
semi-continuous range of colors) such that a mouse-click will select
the color indicated.

• In reality, a color-picker displays the palette colors associated with index


values from 0 to 255.
• Next Fig. displays the concept of a color-picker: if the user selects the color
block with index value 2, then the color meant is cyan, with RGB values (0,
255, 255).

• A very simple animation process is possible via simply changing the


color table: this is called color cycling or palette animation.

20 DR. GEORGE KARRAZ, Ph. D.


Color Look-up Tables

• Color-picker for 8-bit color: each block of the color-picker corresponds


to one row of the color LUT

21 DR. GEORGE KARRAZ, Ph. D.


• (a) shows a 24-bit color image of “Lena”,
• (b) shows the same image reduced to only 5 bits via dithering.
• A detail of the left eye is shown in (c).

(a) (b)

(c)

22 DR. GEORGE KARRAZ, Ph. D.


Popular File Formats

• 8-bit GIF : one of the most important formats because of its historical
connection to the WWW and HTML markup language as the first
image type recognized by net browsers.

• JPEG: currently the most important common file format.

23 DR. GEORGE KARRAZ, Ph. D.


GIF
• GIF standard: (We examine GIF standard because it is so simple! yet
contains many common elements.)
• Limited to 8-bit (256) color images only, which, while producing
acceptable color images, is best suited for images with few distinctive
colors (e.g., graphics or drawing).
• GIF standard supports interlacing — successive display of pixels in
widely-spaced rows by a 4-pass display process.
• GIF actually comes in two flavors:
• 1. GIF87a: The original specification.
• 2. GIF89a: The later version. Supports simple animation via a Graphics
Control Extension block in the data, provides simple control over delay time,
a transparency index, etc.

24 DR. GEORGE KARRAZ, Ph. D.


JPEG

• JPEG: The most important current standard for image


compression.
• The human vision system has some specific limitations and
JPEG takes advantage of these to achieve high rates of
compression.
• JPEG allows the user to set a desired level of quality, or
compression ratio (input divided by output).

25 DR. GEORGE KARRAZ, Ph. D.


PNG

• PNG format: standing for Portable Network Graphics — meant to


supersede the GIF standard, and extends it in important ways.
• Special features of PNG files include:
1. Support for up to 48 bits of color information — a large increase.
2. Files may contain gamma-correction information for correct display of
color images, as well as alpha-channel information for such uses as
control of transparency.
3. The display progressively displays pixels in a 2-dimensional fashion by
showing a few pixels at a time over seven passes through each 8 x 8
block of an image.

26 DR. GEORGE KARRAZ, Ph. D.


TIFF

• TIFF: stands for Tagged Image File Format.


• The support for attachment of additional information (referred to as
“tags”) provides a great deal of flexibility.
1. The most important tag is a format signifier: what type of compression
etc. is in use in the stored image.
2. TIFF can store many different types of image: 1-bit, grayscale, 8-bit color,
24-bit RGB, etc.
3. TIFF was originally a lossless format but now a new JPEG tag allows one
to opt for JPEG compression.
4. The TIFF format was developed by the Aldus Corporation in the 1980's
and was later supported by Microsoft.

27 DR. GEORGE KARRAZ, Ph. D.


THANK YOU!
NEXT: DIGITAL IMAGE PROCESSING & ANALYSIS

DR. GEORGE KARRAZ, Ph. D.


COMPUTER VISION
LECTURE IV
DIGITAL IMAGE PROCESSING AND ANALYSIS

DR. GEORGE KARRAZ, Ph. D.


Topics of This Lecture
 Common Types of Noise
 Linear filters gx
 What are they? How are they applied? I
 Application: smoothing
 Gaussian filter
 What does it mean to filter an image?
 Nonlinear Filters
 Median filter

 Multi-Scale representations
 How to properly rescale an image?
 Image derivatives
 How to compute gradients robustly?

2 DR. GEORGE KARRAZ, Ph. D.


Common Types of Noise
 Salt & pepper noise
 Random occurrences of
black and white pixels

 Impulse noise
 Random occurrences of
white pixels

 Gaussian noise
 Variations in intensity drawn
from a Gaussian (“Normal”)
distribution.

 Basic Assumption
 Noise is i.i.d. (independent &
identically distributed)

3 DR. GEORGE KARRAZ, Ph. D.


Gaussian Noise

>> noise = randn(size(im)).*sigma;


4
>> output = im + noise; DR. GEORGE KARRAZ, Ph. D.
First Attempt at a Solution
 Assumptions:
 Expect pixels to be like their neighbors
 Expect noise processes to be independent from pixel to pixel
(“i.i.d. = independent, identically distributed”)

 Let’s try to replace each pixel with an average of all the


values in its neighborhood…

5 DR. GEORGE KARRAZ, Ph. D.


Moving Average in 2D

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

6 DR. GEORGE KARRAZ, Ph. D.


Moving Average in 2D

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

7 DR. GEORGE KARRAZ, Ph. D.


Moving Average in 2D

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

8 DR. GEORGE KARRAZ, Ph. D.


Moving Average in 2D

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

9 DR. GEORGE KARRAZ, Ph. D.


Moving Average in 2D

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30 30

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 90 0 90 90 90 0 0

0 0 0 90 90 90 90 90 0 0

0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

10 DR. GEORGE KARRAZ, Ph. D.


Moving Average in 2D

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 20 10

0 0 0 90 90 90 90 90 0 0 0 20 40 60 60 60 40 20

0 0 0 90 90 90 90 90 0 0 0 30 60 90 90 90 60 30

0 0 0 90 90 90 90 90 0 0 0 30 50 80 80 90 60 30

0 0 0 90 0 90 90 90 0 0 0 30 50 80 80 90 60 30

0 0 0 90 90 90 90 90 0 0 0 20 30 50 50 60 40 20

0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 30 20 10

0 0 90 0 0 0 0 0 0 0 10 10 10 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

11 DR. GEORGE KARRAZ, Ph. D.


Correlation Filtering
 Say the averaging window size is 2k+1 x 2k+1:

Attribute uniform Loop over all pixels in


weight to each neighborhood around image pixel
pixel F[i,j]
 Now generalize to allow different weights depending
on neighboring pixel’s relative position:

Non-uniform weights
12
DR. GEORGE KARRAZ, Ph. D.
Correlation Filtering

 This is called cross-correlation, denoted

 Filtering an image
 Replace each pixel by a
1 2
weighted combination of (0,0)

its neighbors. H
3 4
 The filter “kernel” or “mask”
is the prescription for the F
weights in the linear
combination.
(N,N)

13 DR. GEORGE KARRAZ, Ph. D.


Convolution
 Convolution:
 Flip the filter in both dimensions (bottom to top, right to left)
 Then apply cross-correlation

4 3
1 2 (0,0)
H
2
H 1
3 4

Notation for
F
convolution
operator (N,N)

14 DR. GEORGE KARRAZ, Ph. D.


Convolution vs. Correlation
 Correlation

 Convolution Note the difference!

 Note
 If H[-u,-v] = H[u,v], then correlation  convolution.

15 DR. GEORGE KARRAZ, Ph. D.


Shift Invariant Linear System

 Shift invariant:
 Operator behaves the same everywhere, i.e. the value of the
output depends on the pattern in the image neighborhood,
not the position of the neighborhood.
 Linear:
 Superposition: h * (f1 + f2) = (h * f1) + (h * f2)
 Scaling: h * (k f ) = k (h * f)

16 DR. GEORGE KARRAZ, Ph. D.


Properties of Convolution
 Linear & shift invariant
 Commutative: f g=g f

 Associative: (f g) h = f (g h)
 Often apply several filters in sequence: (((a b1) b2) b3)
 This is equivalent to applying one filter: a (b1 b2 b3)

 Identity: f e=f
 for unit impulse e = […, 0, 0, 1, 0, 0, …].

 Differentiation:

17 DR. GEORGE KARRAZ, Ph. D.


Averaging Filter
 What values belong in the kernel H[u,v] for the moving
average example?
=
0 0 0 0 0 0 0 0 0 0

0
0

0
0

0
0

90
0

90
0

90
0

90
0

90
0

0
0

0
1 1 1 0 10 20 30 30

0
0

0
0

0
90

90
90

90
90

90
90

90
90

90
0

0
0

0 ?
1 1 1
0 0 0 90 0 90 90 90 0 0
1 1 1
0 0 0 90 90 90 90 90 0 0

“box filter”
0 0 0 0 0 0 0 0 0 0

0 0 90 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

18 DR. GEORGE KARRAZ, Ph. D.


Smoothing by Averaging
depicts box filter:
white = high value, black = low value

Original Filtered
“Ringing” artifacts!
19 DR. GEORGE KARRAZ, Ph. D.
Smoothing with a Gaussian

Original Filtered

20 DR. GEORGE KARRAZ, Ph. D.


Gaussian Smoothing

 Gaussian kernel

 Rotationally symmetric
 Weights nearby pixels more
than distant ones
 This makes sense as
‘probabilistic’ inference
about the signal

 A Gaussian gives a good model


of a fuzzy blob

21 DR. GEORGE KARRAZ, Ph. D.


Gaussian Smoothing
 What parameters matter here?
 Variance  of Gaussian
 Determines extent of smoothing

σ = 2 with 3030 σ = 5 with 3030


kernel kernel

22 DR. GEORGE KARRAZ, Ph. D.


Gaussian Smoothing
 What parameters matter here?
 Size of kernel or mask
 Gaussian function has infinite support, but discrete filters use finite
kernels

σ = 5 with 1010 σ = 5 with 3030


kernel kernel

 Rule : set filter half-width to about 3σ

23 DR. GEORGE KARRAZ, Ph. D.


Gaussian Smoothing in Matlab

>> hsize = 10;


>> sigma = 5;
>> h = fspecial(‘gaussian’ hsize, sigma);

>> mesh(h);

>> imagesc(h);

>> outim = imfilter(im, h);


>> imshow(outim);

24 DR. GEORGE KARRAZ, Ph. D.


Topics of This Lecture
 Linear filters
 What are they? How are they applied?
 Application: smoothing
 Gaussian filter
 What does it mean to filter an image?

 Nonlinear Filters
 Median filter

 Multi-Scale representations
 How to properly rescale an image?
 Image derivatives
 How to compute gradients robustly?

25 DR. GEORGE KARRAZ, Ph. D.


Why Does This Work?
 A small excursion into the Fourier transform to talk
about spatial frequencies…

3 cos(x)

+ 1 cos(3x)

+ 0.8 cos(5x)

+ 0.4 cos(7x)

+… DR. GEORGE KARRAZ, Ph. D.


26
The Fourier Transform in Pictures

 A small excursion into the Fourier transform to talk


“high” “low” “high”
about spatial frequencies…
¨
Frequency spectrum

3 cos(x)

+ 1 cos(3x)

+ 0.8 cos(5x)

+ 0.4 cos(7x)

+… DR. GEORGE KARRAZ, Ph. D.


27
Fourier Transforms of Important Functions
 Sine and cosine transform to…
1.5
1.5

1
1

? ?
0.5 0.5

0
¨ 0
¨
-0.5 -0.5

-1 -1

-1.5 -1.5

28 DR. GEORGE KARRAZ, Ph. D.


Fourier Transforms of Important Functions
 Sine and cosine transform to “frequency spikes”
1.5
1.5

 1

0.5 0.5

0
¨ -1
1
0
¨
-0.5 -0.5
-1 1
-1
- -1

-1.5 -1.5

 A Gaussian transforms to…

¨
?

29 DR. GEORGE KARRAZ, Ph. D.


Fourier Transforms of Important Functions
 Sine and cosine transform to “frequency spikes”
1.5
1.5

 1

0.5 0.5

0
¨ -1
1
0
¨
-0.5 -0.5
-1 1
-1
- -1

-1.5 -1.5

 A Gaussian transforms to a Gaussian

 A box filter transforms to…


¨
?
30 DR. GEORGE KARRAZ, Ph. D.
Fourier Transforms of Important Functions
 Sine and cosine transform to “frequency spikes”
1.5
1.5

 1

0.5 0.5

0
¨ -1
1
0
¨
-0.5 -0.5
-1 1
-1
- -1

-1.5 -1.5

 A Gaussian transforms to a Gaussian


All of this is
symmetric!
¨

 A box filter transforms to a sinc


sin x
¨ sinc( x) =
x
31
DR. GEORGE KARRAZ, Ph. D.
Effect of Convolution
 Convolving two functions in the image domain
corresponds to taking the product of their transformed
versions in the frequency domain.

f g ¨ F G

 This gives us a tool to manipulate image spectra.


 A filter attenuates or enhances certain frequencies through this
effect.

32 DR. GEORGE KARRAZ, Ph. D.


Low-Pass vs. High-Pass

Low-pass
filtered

High-pass
filtered
Original image

33 DR. GEORGE KARRAZ, Ph. D.


Image Source: S. Chenney
Quiz: What Effect Does This Filter Have?

34 DR. GEORGE KARRAZ, Ph. D.


Sharpening Filter

Original

Sharpening filter
− Accentuates differences
with local average
35 DR. GEORGE KARRAZ, Ph. D.
Sharpening Filter

36 DR. GEORGE KARRAZ, Ph. D.


Application: High Frequency Emphasis
Original High pass Filter

High Frequency High Frequency Emphasis


Emphasis +
37 DR. GEORGE KARRAZ, Ph. D. Histogram Equalization
Topics of This Lecture
 Linear filters
 What are they? How are they applied?
 Application: smoothing
 Gaussian filter
 What does it mean to filter an image?
 Nonlinear Filters
 Median filter

 Multi-Scale representations
 How to properly rescale an image?
 Image derivatives
 How to compute gradients robustly?

38 DR. GEORGE KARRAZ, Ph. D.


Non-Linear Filters: Median Filter
 Basic idea
 Replace each pixel by the
median of its neighbors.

 Properties
 Doesn’t introduce new pixel
values
 Removes spikes: good for
impulse, salt & pepper
noise
 Linear?

39 DR. GEORGE KARRAZ, Ph. D.


Median Filter

Salt and Median


pepper filtered
noise

Plots of a row of the image


40 DR. GEORGE KARRAZ, Ph. D.
Median Filter

 The Median filter is edge preserving.

41 DR. GEORGE KARRAZ, Ph. D.


DR. GEORGE KARRAZ, Ph. D.
Median vs. Gaussian Filtering
3x3 5x5 7x7

Gaussian

Median

42
Topics of This Lecture
 Linear filters
 What are they? How are they applied?
 Application: smoothing
 Gaussian filter
 What does it mean to filter an image?

 Nonlinear Filters
 Median filter

 Multi-Scale representations
 How to properly rescale an image?

 Image derivatives
 How to compute gradients robustly?

43 DR. GEORGE KARRAZ, Ph. D.


Motivation: Fast Search Across Scales

44 DR. GEORGE KARRAZ, Ph. D.


Image Pyramid
Low resolution

High resolution
45 DR. GEORGE KARRAZ, Ph. D.
How Should We Go About Resampling?
 Resa,mple Let’s resample the
checkerboard by taking
one sample at each
circle.

In the top left board, the


new representation is
reasonable. Top right
also yields a reasonable
representation.

Bottom left is all black


(dubious) and bottom
right has checks that are
too big.

46 DR. GEORGE KARRAZ, Ph. D.


Fourier Interpretation: Discrete Sampling
 Sampling in the spatial domain is like multiplying with a
spike function.

 Sampling in the frequency domain is like...

47 DR. GEORGE KARRAZ, Ph. D.


Source: S. Chenney
Fourier Interpretation: Discrete Sampling
 Sampling in the spatial domain is like multiplying with a
spike function.

 Sampling in the frequency domain is like convolving with a


spike function.

48 DR. GEORGE KARRAZ, Ph. D.


Sampling and Aliasing

49 DR. GEORGE KARRAZ, Ph. D.


Sampling and Aliasing

“Nyquist limit”

 Nyquist theorem:
 In order to recover a certain frequency f, we need to sample with at least 2f.
 This corresponds to the point at which the transformed frequency spectra start
to overlap.

50 DR. GEORGE KARRAZ, Ph. D.


Sampling and Aliasing

“Nyquist limit”

51 DR. GEORGE KARRAZ, Ph. D.


Aliasing in Graphics

52 DR. GEORGE KARRAZ, Ph. D.


Resampling with Prior Smoothing

 Note: We cannot recover the high frequencies, but we can


avoid artifacts by smoothing before resampling.

53 DR. GEORGE KARRAZ, Ph. D.


The Gaussian Pyramid
Low resolution G4 = (G3 * gaussian)  2
G3 = (G2 * gaussian
blur )  2
blur
G2 = (G1 * gaussian)  2

blur
G1 = (G0 * gaussian)  2

G0 = Image
blur

High resolution
54 DR. GEORGE KARRAZ, Ph. D.
Gaussian Pyramid – Stored Information
All the extra
levels add very
little overhead
for memory or
computation!

55 DR. GEORGE KARRAZ, Ph. D.


Summary: Gaussian Pyramid
 Construction: create each level from previous one
 Smooth and sample

 Smooth with Gaussians, in part because


 Gaussian*Gaussian = another Gaussian
 G(1) * G(2) = G(sqrt(1 2 + 2 2))

 Gaussians are low-pass filters, so the representation is


redundant once smoothing has been performed.
 There is no need to store smoothed images at the
full original resolution.

56 DR. GEORGE KARRAZ, Ph. D.


The Laplacian Pyramid 57 DR. GEORGE KARRAZ, Ph. D.

Li = G i − expand(G i +1 )
Gaussian Laplacian Pyramid
G i = Li + expand(G i +1 )
GPyramid
n Ln = Gn
G2
- = L2
G1 L1
- =
G0
L0

- =
Why is this useful?
Laplacian ~ Difference of Gaussian

- =

DoG = Difference of Gaussians


Cheap approximation – no derivatives needed.

- =

58 DR. GEORGE KARRAZ, Ph. D.


Topics of This Lecture
 Linear filters
 What are they? How are they applied?
 Application: smoothing
 Gaussian filter
 What does it mean to filter an image?

 Nonlinear Filters
 Median filter

 Multi-Scale representations
 How to properly rescale an image?
 Image derivatives
 How to compute gradients robustly?

59 DR. GEORGE KARRAZ, Ph. D.


Edges and Derivatives…

1st derivative

2nd derivative

61 DR. GEORGE KARRAZ, Ph. D.


Differentiation and Convolution
 For the 2D function f(x,y), the partial derivative is:
f ( x, y) f ( x +  , y) − f ( x, y)
= lim
x  →0 
 For discrete data, we can approximate this using finite
differences:

f ( x, y) f ( x + 1, y) − f ( x, y)

x 1
 To implement the above as convolution, what would be
the associated filter?

62 DR. GEORGE KARRAZ, Ph. D.


Partial Derivatives of an Image

f ( x, y) f ( x, y )
x y

-1 ? 1
-1 1 1
or
-1
Which shows changes with respect to x?
63 DR. GEORGE KARRAZ, Ph. D.
Assorted Finite Difference Filters

>> My = fspecial(‘sobel’);
>> outim = imfilter(double(im), My);
>> imagesc(outim);
>> colormap gray;

64 DR. GEORGE KARRAZ, Ph. D.


Image Gradient
 The gradient of an image:

 The gradient points in the direction of most rapid intensity change

 The gradient direction (orientation of edge normal) is given by:

 The edge strength is given by the gradient magnitude

65 DR. GEORGE KARRAZ, Ph. D.


Effect of Noise

 Consider a single row or column of the image


 Plotting intensity as a function of position gives a signal

Where is the edge?


66 DR. GEORGE KARRAZ, Ph. D.
Solution: Smooth First

Where is the edge? Look for peaks in


67 DR. GEORGE KARRAZ, Ph. D.
Derivative Theorem of Convolution

 Differentiation property of convolution.

68 DR. GEORGE KARRAZ, Ph. D.


Derivative of Gaussian Filter

( I  g )  h = I  ( g  h)

 
0.0030 0.0133 0.0219 0.0133 0.0030

 1 −1 
0.0133 0.0596 0.0983 0.0596 0.0133
0.0219 0.0983 0.1621 0.0983 0.0219
0.0133 0.0596 0.0983 0.0596 0.0133
0.0030 0.0133 0.0219 0.0133 0.0030

Why is this preferable?

69 DR. GEORGE KARRAZ, Ph. D.


Derivative of Gaussian Filters

x-direction y-direction

70 DR. GEORGE KARRAZ, Ph. D.


Source: Svetlana Lazebnik
Laplacian of Gaussian (LoG)

 Consider

Where is the edge? Zero-crossings of bottom graph


71 DR. GEORGE KARRAZ, Ph. D.
Summary: 2D Edge Detection Filters

Laplacian of Gaussian

Gaussian Derivative of Gaussian

 is the Laplacian operator:

72 DR. GEORGE KARRAZ, Ph. D.


Note: Filters are Templates
 Applying a filter at some point can  Insight
be seen as taking a dot-product  Filters look like the effects they
between the image and some are intended to find.
vector.  Filters find effects they look
 Filtering the image is a set of dot like.
products.

73 DR. GEORGE KARRAZ, Ph. D.


Where’s Waldo?

Template

74 DR. GEORGE KARRAZ, Ph. D. Scene


Where’s Waldo?

Template

75 DR. GEORGE KARRAZ, Ph. D.


Detected template
Where’s Waldo?

Detected template Correlation map

76 DR. GEORGE KARRAZ, Ph. D.


Correlation as Template Matching
 Think of filters as a dot product of the filter vector with
the image region
 Now measure the angle between the vectors
a b
a  b =| a || b | cos cos  =
| a || b |

 Angle (similarity) between vectors can be measured by


normalizing length of each vector to 1.

a
  b
Template

77 DR. GEORGE KARRAZ, Ph. D. Image region Vector interpretation


Summary: Mask Properties
 Smoothing
 Values positive
 Sum to 1  constant regions same as input
 Amount of smoothing proportional to mask size
 Remove “high-frequency” components; “low-pass” filter

 Derivatives
 Opposite signs used to get high response in regions of high contrast
 Sum to 0  no response in constant regions
 High absolute value at points of high contrast

 Filters act as templates


• Highest response for regions that “look the most like the filter”
• Dot product as correlation

78 DR. GEORGE KARRAZ, Ph. D.


Summary Linear Filters

• Linear filtering: Examples:


➢ Form a new image whose • Smoothing with a box filter
pixels are a weighted sum • Smoothing with a Gaussian
of original pixel values • Finding a derivative
• Searching for a template
• Properties
➢ Output is a shift-invariant
function of the input (same Pyramid representations
at each image location) • Important for describing and
searching an image at all
scales

79 DR. GEORGE KARRAZ, Ph. D.


THANK YOU!

NEXT: EDGE & STRUCTURE EXTRACTION

DR. GEORGE KARRAZ, Ph. D.

80
COMPUTER VISION
Lectures II, III Image Filters
LECTURE V
EDGE & STRUCTURE EXTRACTION

D R.George
Dr. GEORGE KARRAZ
Karraz, , Ph. D.
Ph.D.
1
Course Outline
• Image Processing Basics
➢ Image Formation
➢ Binary Image Processing
➢ Linear Filters
➢ Edge & Structure Extraction
➢ Color

• Segmentation
• Local Features & Matching
• Object Recognition and Categorization
• 3D Reconstruction
• Motion and Tracking
2
DR. GEORGE KARRAZ, Ph. D.
Recap: Gaussian Smoothing
• Gaussian kernel

• Rotationally symmetric
• Weights nearby pixels more
than distant ones
➢ This makes sense as
‘probabilistic’ inference
about the signal

• A Gaussian gives a good model


of a fuzzy blob

3
DR. GEORGE KARRAZ, Ph. D.
Smoothing with a Gaussian
Parameter σ is the “scale” / “width” / “spread” of the
Gaussian kernel, and controls the amount of smoothing.

for sigma=1:3:10
h = fspecial('gaussian‘, fsize, sigma);
out = imfilter(im, h);
imshow(out);
pause; 4
end DR. GEORGE KARRAZ, Ph. D.
Recap: Derivatives and Edges…

1st derivative

2nd derivative

6
DR. GEORGE KARRAZ, Ph. D.
Recap: 2D Edge Detection Filters

Laplacian of Gaussian

Gaussian Derivative of Gaussian

• is the Laplacian operator:

7
DR. GEORGE KARRAZ, Ph. D.
Topics of This Lecture
• Edge detection
➢ Recap: Gradients, scale influence
➢ Canny edge detector

• Fitting as template matching


➢ Distance transform
➢ Chamfer matching
➢ Application: traffic sign detection

• Fitting as parametric search


➢ Line detection
➢ Hough transform
➢ Extension to circles
➢ Generalized Hough transform
8
DR. GEORGE KARRAZ, Ph. D.
Edge Detection
• Goal: map image from 2D array of pixels to a set of
curves or line segments or contours.
• Why?

• Main idea: look for strong gradients, post-process

9
DR. GEORGE KARRAZ, Ph. D.
What Can Cause an Edge?

Depth discontinuity:
Reflectance change: object boundary
appearance
information, texture

Cast shadows

Change in surface
orientation: shape

10
DR. GEORGE KARRAZ, Ph. D.
Contrast and Invariance

11
DR. GEORGE KARRAZ, Ph. D.
Recall: Images as Functions

Edges look like steep cliffs


12
DR. GEORGE KARRAZ, Ph. D.
Gradients → Edges

Primary edge detection steps


1. Smoothing: suppress noise
2. Edge enhancement: filter for contrast
3. Edge localization
➢ Determine which local maxima from filter output are actually
edges vs. noise
➢ Thresholding, thinning

13
DR. GEORGE KARRAZ, Ph. D.
Effect of  on Derivatives

σ = 1 pixel σ = 3 pixels

• The apparent structures differ depending on Gaussian’s


scale parameter.

 Larger values: larger scale edges detected


 Smaller values: finer features detected

14
DR. GEORGE KARRAZ, Ph. D.
So, What Scale to Choose?
• It depends on what we’re looking for…

• Too fine a scale… can’t see the forest for the trees.
• Too coarse a scale… can’t tell the maple from the cherry.
DR. GEORGE KARRAZ, Ph. D. 15
Recall: Thresholding
• Choose a threshold t
• Set any pixels less than t
to zero (off).
• Set any pixels greater than
or equal t to one (on).

1, if F i, j   t
FT i, j  = 
0, otherwise

DR. GEORGE KARRAZ, Ph. D. 16


Original Image

17
DR. GEORGE KARRAZ, Ph. D.
Gradient Magnitude Image

18
DR. GEORGE KARRAZ, Ph. D.
Thresholding with a lower threshold

19
DR. GEORGE KARRAZ, Ph. D.
Thresholding with a Higher Threshold

20
DR. GEORGE KARRAZ, Ph. D.
Designing an Edge Detector
• Criteria for an “optimal” edge detector:
➢ Good detection: the optimal detector must minimize the
probability of false positives (detecting spurious edges caused by
noise), as well as that of false negatives (missing real edges)
➢ Good localization: the edges detected must be as close as
possible to the true edges
➢ Single response: the detector must return one point only for
each true edge point; that is, minimize the number of local
maxima around the true edge

21
DR. GEORGE KARRAZ, Ph. D.
Canny Edge Detector
• This is probably the most widely used edge detector in
computer vision
• Theoretical model: step-edges corrupted by additive
Gaussian noise
• Canny has shown that the first derivative of the
Gaussian closely approximates the operator that
optimizes the product of signal-to-noise ratio and
localization

22
DR. GEORGE KARRAZ, Ph. D.
Canny Edge Detector
• Filter image with derivative of Gaussian
• Find magnitude and orientation of gradient
• Non-maximum suppression:
➢ Thin multi-pixel wide “ridges” down to single pixel width
• Linking and thresholding (hysteresis):
➢ Define two thresholds: low and high
➢ Use the high threshold to start edge curves and the low
threshold to continue them

• MATLAB:
>> edge(image, ‘canny’);
>> help edge

23
DR. GEORGE KARRAZ, Ph. D.
The Canny Edge Detector

original image (Lena)


24
DR. GEORGE KARRAZ, Ph. D.
The Canny Edge Detector

Norm of the gradient

25
DR. GEORGE KARRAZ, Ph. D.
The Canny Edge Detector

Thresholding

26
DR. GEORGE KARRAZ, Ph. D.
The Canny Edge Detector

How to turn
these thick
regions of
the gradient
into curves?

Thresholding

27
DR. GEORGE KARRAZ, Ph. D.
Non-Maximum Suppression

• Check if pixel is local maximum along gradient direction,


select single max across width of the edge
➢ requires checking interpolated pixels p and r

28
DR. GEORGE KARRAZ, Ph. D.
The Canny Edge Detector

Problem:
pixels along
this edge
didn’t survive
the
thresholding

Thinning
(non-maximum suppression)
29
DR. GEORGE KARRAZ, Ph. D.
Hysteresis Thresholding
• Hysteresis: A lag or momentum factor
• Idea: Maintain two thresholds khigh and klow
➢ Use khigh to find strong edges to start edge chain
➢ Use klow to find weak edges which continue edge chain

• Typical ratio of thresholds is roughly


khigh / klow = 2

30
DR. GEORGE KARRAZ, Ph. D.
Hysteresis Thresholding

Original image

courtesy of G. Loy

High threshold Low threshold Hysteresis threshold


(strong edges) (weak edges)
31
DR. GEORGE KARRAZ, Ph. D.
Object Boundaries vs. Edges

Background Texture Shadows


32
DR. GEORGE KARRAZ, Ph. D.
Edge Detection is Just the Beginning…
Image Human segmentation Gradient magnitude

33
DR. GEORGE KARRAZ, Ph. D.
Fitting
• Want to associate a model with observed features

For example, the model could be a line, a circle, or an arbitrary


shape.
34
DR. GEORGE KARRAZ, Ph. D.
Topics of This Lecture
• Edge detection
➢ Recap: Gradients, scale influence
➢ Canny edge detector

• Fitting as template matching


➢ Distance transform
➢ Chamfer matching
➢ Application: traffic sign detection

• Fitting as parametric search


➢ Line detection
➢ Hough transform
➢ Extension to circles
➢ Generalized Hough transform
35
DR. GEORGE KARRAZ, Ph. D.
Fitting as Template Matching
• We’ve already seen that correlation filtering can be
used for template matching in an image.

• Let’s try this idea with “edge templates”.


➢ Example: traffic sign detection in (grayvalue) video.

Templates 36
DR. GEORGE KARRAZ, Ph. D.
How Can This Be Made Efficient?
• Fast edge-based template matching
➢ Distance transform of the edge image

Original Gradient Distance transform

Value at (x,y) tells how


far that position is from
the nearest edge point
(or other binary mage
structure)
>> help bwdist
DR. GEORGE KARRAZ, Ph. D.
Edges 37
Distance Transform
• Image reflecting distance to nearest point in point set
(e.g., edge pixels, or foreground pixels).

4-connected 8-connected
adjacency adjacency

38
DR. GEORGE KARRAZ, Ph. D.
Distance Transform Algorithm (1D)
• Two-pass O(n) algorithm for 1D L1 norm
1. Initialize: For all j
➢ D[j]  1P[j] // 0 if j is in P, infinity otherwise

2. Forward: For j from 1 up to n-1


➢ D[j]  min( D[j], D[j-1]+1 )

3. Backward: For j from n-2 down to 0


➢ D[j]  min( D[j], D[j+1]+1 )

DR. GEORGE KARRAZ, Ph. D. 39


Distance Transform Algorithm (2D)
• 2D case analogous to 1D
➢ Initialization
➢ Forward and backward pass
– Fwd pass finds closest above and to the left
– Bwd pass finds closest below and to the right

40
DR. GEORGE KARRAZ, Ph. D.
Chamfer Matching
• Chamfer Distance
➢ Average distance to nearest feature

➢ This can be computed efficiently by correlating the edge


template with the distance-transformed image

Edge image Distance transform image


DR. GEORGE KARRAZ, Ph. D. 41
Chamfer Matching
• Efficient implementation
➢Instead of correlation, sample fixed number
of points on template contour.
 Chamfer score boils down to series of DT lookups.

 Computational effort independent of scale.

Edge image Distance transform image


DR. GEORGE KARRAZ, Ph. D. 42
Chamfer Matching Results

Edge image Distance transform image


DR. GEORGE KARRAZ, Ph. D. 43
Chamfer Matching for Pedestrian Detection
• Organize templates in tree structure for fast matching

44
DR. GEORGE KARRAZ, Ph. D.
Summary Chamfer Matching
• Pros
➢ Fast and simple method for matching edge-based templates.
➢ Works well for matching upright shapes with little intra-class
variation.
➢ Good method for finding candidate matches in a longer
recognition pipeline.

• Cons
➢ Chamfer score averages over entire contour, not very
discriminative in practice.
 Further verification needed.
➢ Low matching cost in cluttered regions with many edges.
 Many false positive detections.
➢ In order to detect rotated & rescaled shapes, need to match
with rotated & rescaled templates  can get very expensive.
45
DR. GEORGE KARRAZ, Ph. D.
Topics of This Lecture
• Edge detection
➢ Recap: Gradients, scale influence
➢ Canny edge detector

• Fitting as template matching


➢ Distance transform
➢ Chamfer matching
➢ Application: traffic sign detection

• Fitting as parametric search


➢ Line detection
➢ Hough transform
➢ Extension to circles
➢ Generalized Hough transform
46
DR. GEORGE KARRAZ, Ph. D.
Fitting as Search in Parametric Space
• Choose a parametric model to represent a set of
features
• Membership criterion is not local
➢ Can’t tell whether a point belongs to a given model just by
looking at that point.
• Three main questions:
➢ What model represents this set of features best?
➢ Which of several model instances gets which feature?
➢ How many model instances are there?
• Computational complexity is important
➢ It is infeasible to examine every possible set of parameters and
every possible combination of features

47
DR. GEORGE KARRAZ, Ph. D.
Example: Line Fitting
• Why fit lines?
Many objects characterized by presence of straight lines

• Wait, why aren’t we done just by running edge detection?


DR. GEORGE KARRAZ, Ph. D. 48
Difficulty of Line Fitting

• Extra edge points (clutter),


multiple models:
➢ Which points go with
which line, if any?

• Only some parts of each


line detected, and some
parts are missing:
➢ How to find a line that
bridges missing evidence?

• Noise in measured edge


points, orientations:
➢ How to detect true underlying
parameters?
49
DR. GEORGE KARRAZ, Ph. D.
Voting
• It’s not feasible to check all combinations of features by
fitting a model to each possible subset.
• Voting is a general technique where we let the features
vote for all models that are compatible with it.
➢ Cycle through features, cast votes for model parameters.
➢ Look for model parameters that receive a lot of votes.
• Noise & clutter features will cast votes too, but typically
their votes should be inconsistent with the majority of
“good” features.
• Ok if some features not observed, as model can span
multiple fragments.

50
DR. GEORGE KARRAZ, Ph. D.
Fitting Lines
• Given points that belong to a line,
what is the line?
• How many lines are there?
• Which points belong to which lines?

• Hough Transform is a voting


technique that can be used to answer
all of these
• Main idea:
1. Record all possible lines on which each
edge point lies.
2. Look for lines that get many votes.

51
DR. GEORGE KARRAZ, Ph. D.
Finding Lines in an Image: Hough Space
y b

b0

x m0 m
Image space Hough (parameter) space

• Connection between image (x,y) and Hough (m,b) spaces


➢ A line in the image corresponds to a point in Hough space.
➢ To go from image space to Hough space:
– Given a set of points (x,y), find all (m,b) such that y = mx + b

52
DR. GEORGE KARRAZ, Ph. D.
Finding Lines in an Image: Hough Space
y b
y0

x0 x m
Image space Hough (parameter) space

• Connection between image (x,y) and Hough (m,b) spaces


➢ A line in the image corresponds to a point in Hough space.
➢ To go from image space to Hough space:
– Given a set of points (x,y), find all (m,b) such that y = mx + b
➢ What does a point (x0, y0) in the image space map to?
– Answer: the solutions of b = -x0m + y0
– This is a line in Hough space
53
DR. GEORGE KARRAZ, Ph. D.
Finding Lines in an Image: Hough Space
y b
(x1, y1)
y0
(x0, y0)

b = –x1m + y1
x0 x m
Image space Hough (parameter) space

• What are the line parameters for the line that contains
both (x0, y0) and (x1, y1)?
➢ It is the intersection of the lines b = –x0m + y0 and
b = –x1m + y1

54
DR. GEORGE KARRAZ, Ph. D.
Finding Lines in an Image: Hough Space
y b

x m
Image space Hough (parameter) space

• How can we use this to find the most likely parameters


(m,b) for the most prominent line in the image space?
➢ Let each edge point in image space vote for a set of possible
parameters in Hough space
➢ Accumulate votes in discrete set of bins; parameters with the
most votes indicate line in image space.

55
DR. GEORGE KARRAZ, Ph. D.
Polar Representation for Lines
• Issues with usual (m,b) parameter space: can take on
infinite values, undefined for vertical lines.

[0,0] x d : perpendicular distance


 from line to origin

y d  : angle the
perpendicular makes with
the x-axis

x cos  − y sin  = d

• Point in image space  sinusoid segment in Hough space


56
DR. GEORGE KARRAZ, Ph. D.
Hough Transform Algorithm
H: accumulator array (votes)
Using the polar parameterization:
x cos  − y sin  = d
Basic Hough transform algorithm 
1. Initialize H[d,] = 0.

2. For each edge point (x,y) in the image

for  = 0 to 180 // some quantization d


d = x cos  − y sin 
H[d, ] += 1
3. Find the value(s) of (d,) where H[d,] is maximum.
4. The detected line in the image is given by d = x cos − y sin 
Hough line demo

• Time complexity (in terms of number of votes)?


57
DR. GEORGE KARRAZ, Ph. D.
Example: HT for Straight Lines

d
y

x 
Image space Votes
edge coordinates
Bright value = high vote count
Black = no votes
58
DR. GEORGE KARRAZ, Ph. D.
Example: HT for Straight Lines

Square:

59
DR. GEORGE KARRAZ, Ph. D.
Example: HT for Straight Lines

60
DR. GEORGE KARRAZ, Ph. D.
Real-World Examples

61
DR. GEORGE KARRAZ, Ph. D.
Showing longest segments found
62
DR. GEORGE KARRAZ, Ph. D.
Impact of Noise on Hough Transform

y d

x 
Image space Votes
edge coordinates

What difficulty does this present for an implementation?


63
DR. GEORGE KARRAZ, Ph. D.
Impact of Noise on Hough Transform

Image space Votes


edge coordinates

Here, everything appears to be “noise”, or random edge


points, but we still see peaks in the vote space.
64
DR. GEORGE KARRAZ, Ph. D.
Extensions
Extension 1: Use the image gradient
1. same
2. for each edge point I[x,y] in the image
 = gradient at (x,y)
d = x cos − y sin 
H[d,] += 1
3. same
4. same
(Reduces degrees of freedom)

65
DR. GEORGE KARRAZ, Ph. D.
Extensions
Extension 1: Use the image gradient
1. same
2. for each edge point I[x,y] in the image
compute unique (d,) based on image gradient at (x,y)
H[d,] += 1
3. same
4. same
(Reduces degrees of freedom)

Extension 2
➢ Give more votes for stronger edges (use magnitude of gradient)
Extension 3
➢ Change the sampling of (d,) to give more/less resolution
Extension 4
➢ The same procedure can be used with circles, squares, or any other
shape… 66
DR. GEORGE KARRAZ, Ph. D.
Extension: Cascaded Hough Transform
• Let’s go back to the original (m,b) parametrization
• A line in the image maps to a pencil of lines in the
Hough space
• What do we get with parallel lines or a pencil of lines?
➢ Collinear peaks in the Hough space!
• So we can apply a Hough transform to the output of the
first Hough transform to find vanishing points

67
DR. GEORGE KARRAZ, Ph. D.
Finding Vanishing Points

68
DR. GEORGE KARRAZ, Ph. D.
Cascaded Hough Transform
• Issue: Dealing with the unbounded parameter space

69
DR. GEORGE KARRAZ, Ph. D.
Cascaded Hough Transform

70
DR. GEORGE KARRAZ, Ph. D.
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi − a ) 2 + ( yi − b) 2 = r 2

• For a fixed radius r, unknown gradient direction


b

Image space Hough space a


71
DR. GEORGE KARRAZ, Ph. D.
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi − a ) 2 + ( yi − b) 2 = r 2

• For a fixed radius r, unknown gradient direction

Intersection:
most votes for
center occur
here.

Image space Hough space


72
DR. GEORGE KARRAZ, Ph. D.
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi − a) 2 + ( yi − b) 2 = r 2

• For an unknown radius r, unknown gradient direction


r

b
a
Image space Hough space

73
DR. GEORGE KARRAZ, Ph. D.
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi − a) 2 + ( yi − b) 2 = r 2

• For an unknown radius r, unknown gradient direction


r

b
a
Image space Hough space

74
DR. GEORGE KARRAZ, Ph. D.
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi − a) 2 + ( yi − b) 2 = r 2

• For an unknown radius r, known gradient direction

Image space Hough space

75
DR. GEORGE KARRAZ, Ph. D.
Hough Transform for Circles
For every edge pixel (x,y) :
For each possible radius value r:
For each possible gradient direction θ:
// or use estimated gradient
a = x – r cos(θ)
b = y + r sin(θ)
H[a,b,r] += 1
end
end

76
DR. GEORGE KARRAZ, Ph. D.
Example: Detecting Circles with Hough

Crosshair indicates results of Hough transform,


bounding box found via motion differencing.
77
DR. GEORGE KARRAZ, Ph. D.
Example: Detecting Circles with Hough
Original Edges Votes: Penny

Note: a different Hough transform (with separate accumu-


lators) was used for each circle radius (quarters vs. penny).
78
DR. GEORGE KARRAZ, Ph. D.
Example: Detecting Circles with Hough
Original Edges Votes: Quarter

Combined detections

79
DR. GEORGE KARRAZ, Ph. D.
Voting: Practical Tips
• Minimize irrelevant tokens first (take edge points with
significant gradient magnitude)
• Choose a good grid / discretization
➢ Too coarse: large votes obtained when too many different lines
correspond to a single bucket
➢ Too fine: miss lines because some points that are not exactly
collinear cast votes for different buckets
• Vote for neighbors, also (smoothing in accumulator
array)
• Utilize direction of edge to reduce free parameters by 1
• To read back which points voted for “winning” peaks,
keep tags on the votes.

80
DR. GEORGE KARRAZ, Ph. D.
Hough Transform: Pros and Cons
Pros
• All points are processed independently, so can cope with
occlusion
• Some robustness to noise: noise points unlikely to
contribute consistently to any single bin
• Can detect multiple instances of a model in a single pass
Cons
• Complexity of search time increases exponentially with
the number of model parameters
• Non-target shapes can produce spurious peaks in
parameter space
• Quantization: hard to pick a good grid size
81
DR. GEORGE KARRAZ, Ph. D.
Generalized Hough Transform
• What if want to detect arbitrary shapes defined by
boundary points and a reference point?

At each boundary point,


compute displacement
vector: r = a – pi.
x
a
For a given model shape:
θ θ
p1 p2 store these vectors in a
table indexed by gradient
orientation θ.
Image space

[Dana H. Ballard, Generalizing the Hough Transform to Detect Arbitrary Shapes, 1980]

82
DR. GEORGE KARRAZ, Ph. D.
Generalized Hough Transform
To detect the model shape in a new image:
• For each edge point
➢ Index into table with its gradient orientation θ

➢ Use retrieved r vectors to vote for position of reference point

• Peak in this Hough space is reference point with most


supporting edges

Assuming translation is the only transformation here,


i.e., orientation and scale are fixed.
83
DR. GEORGE KARRAZ, Ph. D.
Example: Generalized Hough Transform
Say we’ve already
stored a table of
displacement vectors
as a function of edge
orientation for this
model shape.

Model shape
DR. GEORGE KARRAZ, Ph. D.
84
Example: Generalized Hough Transform
Now we want to look
at some edge points
detected in a new
image, and vote on
the position of that
shape.

DR. GEORGE KARRAZ, Ph. D. Displacement vectors for model points 85


Example: Generalized Hough Transform

DR. GEORGE KARRAZ, Ph. D. Range of voting locations for test point 86
Example: Generalized Hough Transform

DR. GEORGE KARRAZ, Ph. D. Range of voting locations for test point 87
Example: Generalized Hough Transform

DR. GEORGE KARRAZ, Ph. D.


Votes for points with θ = 88
Example: Generalized Hough Transform

DR. GEORGE KARRAZ, Ph. D. Displacement vectors for model points 89


Example: Generalized Hough Transform

DR. GEORGE KARRAZ, Ph. D. Range of voting locations for test point 90
Example: Generalized Hough Transform

DR. GEORGE KARRAZ, Ph. D. Votes for points with θ = 91


Application in Recognition
• Instead of indexing displacements by gradient
orientation, index by “visual codeword”.

Visual codeword with


displacement vectors
Training image

92
DR. GEORGE KARRAZ, Ph. D.
Application in Recognition
• Instead of indexing displacements by gradient
orientation, index by “visual codeword”.

Test image

• We’ll hear more about this method in lecture 14…


93
DR. GEORGE KARRAZ, Ph. D.
THANK YOU!

NEXT: LOCAL IMAGE FEATURES

DR. GEORGE KARRAZ, Ph. D.

94
COMPUTER VISION
LECTURE VI
LOCAL IMAGE FEATURES

DR. GEORGE KARRAZ, Ph. D.


Contents
• Overview of Keypoint Matching
• Harris corner detector
• Features in Computer Vision
• SIFT Features
→ Scale Invariant Feature Transform

DR. GEORGE KARRAZ, Ph. D. 2


This section: correspondence and alignment
• Correspondence: matching points, patches,
edges, or regions across images

DR. GEORGE KARRAZ, Ph. D. 3


Overview of Keypoint Matching
1. Find a set of
distinctive key-
points
A1
2. Define a region
around each
A2 A3 keypoint

3. Extract and
normalize the
region content
fA fB
4. Compute a local
descriptor from the
normalized region
d ( f A, fB )  T
5. Match local
descriptors
DR. GEORGE KARRAZ, Ph. D. 4
Harris corner detector E(u, v)

• Approximate distinctiveness by local


auto-correlation.
• Approximate local auto-correlation by
second moment matrix
• Quantify distinctiveness (or cornerness)
as function of the eigenvalues of the
second moment matrix.
• But we don’t actually need to (max)-1/2
compute the eigenvalues by (min)-1/2
using the determinant and trace
of the second moment matrix.
DR. GEORGE KARRAZ, Ph. D. 5
DR. GEORGE KARRAZ, Ph. D. 6
Harris Detector [Harris88]
• Second moment matrix
 I x2 ( D ) I x I y ( D )
 ( I ,  D ) = g ( I )    1. Image Ix Iy
 I x I y ( D ) I y ( D ) 
2
derivatives
(optionally, blur first)

2. Square of Ix2 Iy2 IxIy


derivatives
det M = 12
trace M = 1 + 2
3. Gaussian g(Ix2) g(Iy2) g(IxIy)
filter g(I)

4. Cornerness function – both eigenvalues are strong


har = det[  ( I , D)] −  [trace(  ( I , D)) 2 ] =
g ( I x2 ) g ( I y2 ) − [ g ( I x I y )]2 −  [ g ( I x2 ) + g ( I y2 )]2

5. Non-maxima suppression
har
Automatic Scale Selection

f ( I i1im ( x,  )) = f ( I i1im ( x,  ))

How to find corresponding patch sizes?

DR. GEORGE KARRAZ, Ph. D. 7


Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f ( I i1im ( x,  )) f ( I i1im ( x,  ))


DR. GEORGE KARRAZ, Ph. D. 8
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f ( I i1im ( x,  )) f ( I i1im ( x,  ))


DR. GEORGE KARRAZ, Ph. D. 9
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f ( I i1im ( x,  )) f ( I i1im ( x,  ))


DR. GEORGE KARRAZ, Ph. D. 10
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f ( I i1im ( x,  )) f ( I i1im ( x,  ))


DR. GEORGE KARRAZ, Ph. D. 11
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f ( I i1im ( x,  )) f ( I i1im ( x,  ))


DR. GEORGE KARRAZ, Ph. D. 12
Automatic Scale Selection
• Function responses for increasing scale (scale signature)

f ( I i1im ( x,  )) f ( I i1im ( x,  ))


DR. GEORGE KARRAZ, Ph. D. 13
Features in Computer Vision
• What is a feature?
– Location of sudden change
• Why use features?
– Information content high
– Invariant to change of view point, illumination
– Reduces computational burden

DR. GEORGE KARRAZ, Ph. D. 14


(One Type of) Computer Vision
Image 1

Feature 1
Feature 2 Computer
: Vision
Feature N Algorithm

Image 2

Feature 1
Feature 2
:
Feature N

DR. GEORGE KARRAZ, Ph. D. 15


Where Features Are Used

• Calibration
• Image Segmentation
• Correspondence in multiple images (stereo, structure
from motion)
• Object detection, classification
DR. GEORGE KARRAZ, Ph. D. 16
What Makes For Good Features?

• Invariance
– View point (scale, orientation, translation)
– Lighting condition
– Object deformations
– Partial occlusion
• Other Characteristics
– Fast to compute
– Uniqueness
– Sufficiently many
– Tuned to the task
DR. GEORGE KARRAZ, Ph. D. 17
Advanced Features: Topic
SIFT Features
→ Scale Invariant Feature Transform
Want to find … in here

18
DR. GEORGE KARRAZ, Ph. D.
SIFT Features
• Invariances:
– Scaling
– Rotation
– Illumination
– Translation
• Provides
– Good localization

DR. GEORGE KARRAZ, Ph. D. 19


SIFT
• SIFT features are first extracted from a set of
reference images and stored in a database.
• A new image is matched by individually
comparing each feature from the new image to
this previous database and finding candidate
matching features based on Euclidean distance
of their feature vectors.

DR. GEORGE KARRAZ, Ph. D. 20


Invariant Local Features
• Image content is transformed into local feature coordinates that are
invariant to translation, rotation, scale, and other imaging parameters

SIFT Features
DR. GEORGE KARRAZ, Ph. D. 21
Advantages of invariant local features
• Locality: features are local, so robust to occlusion and clutter (no prior
segmentation)
• Distinctiveness: individual features can be matched to a large
database of objects
• Quantity: many features can be generated for even small objects
• Efficiency: close to real-time performance
• Extensibility: can easily be extended to wide range of differing feature
types, with each adding robustness

DR. GEORGE KARRAZ, Ph. D. 22


SIFT Algorithm
Scale-space extrema detection

Keypoint localization

Interpolation of nearby data for accurate position

Discarding low-contrast keypoints

Eliminating edge responses

Orientation assignment

Keypoint descriptor

DR. GEORGE KARRAZ, Ph. D. 23


Scale-space extrema detection

The image is convolved with Gaussian filters at different scales, and then
the difference of successive Gaussian-blurred images are taken. Keypoints
are then taken as maxima/minima of the Difference of Gaussians (DoG)
that occur at multiple scales.

DR. GEORGE KARRAZ, Ph. D. 24


SIFT On-A-Slide
1. Enforce invariance to scale: Compute Gaussian difference max, for may
different scales; non-maximum suppression, find local maxima: key
points candidates
2. Localizable corner: For each maximum fit quadratic function. Compute
center with sub-pixel accuracy by setting first derivative to zero.
3. Eliminate edges: Compute ratio of eigenvalues, drop key points for
which this ratio is larger than a threshold.
4. Enforce invariance to orientation: Compute orientation, to achieve
rotation invariance, by finding the strongest second derivative direction
in the smoothed image (possibly multiple orientations). Rotate patch so
that orientation points up.
5. Compute feature signature: Compute a "gradient histogram" of the
local image region in a 4x4 pixel region. Do this for 4x4 regions of that
size. Orient so that largest gradient points up (possibly multiple
solutions). Result: feature vector with 128 values (15 fields, 8
gradients).
6. Enforce invariance to illumination change and camera saturation:
Normalize to unit length to increase invariance to illumination. Then
threshold all gradients, to become invariant to camera saturation.
DR. GEORGE KARRAZ, Ph. D. 25
Finding “Key points” (Corners)
Idea: Find Corners, but scale invariance

Approach:
• Run linear filter (diff of Gaussians)
• Do this at different resolutions of image
pyramid

DR. GEORGE KARRAZ, Ph. D. 26


DR. GEORGE KARRAZ, Ph. D. 27
Difference of Gaussians

Equals

Minus

DR. GEORGE KARRAZ, Ph. D. 28


DiffOfGauss
• Difference of Gaussian Pyramid
• Difference of each successive image in each
octave

DR. GEORGE KARRAZ, Ph. D. 29


Gaussian Kernel Size i=1

DR. GEORGE KARRAZ, Ph. D. 30


Gaussian Kernel Size i=2

DR. GEORGE KARRAZ, Ph. D. 31


Gaussian Kernel Size i=3

DR. GEORGE KARRAZ, Ph. D. 32


Gaussian Kernel Size i=4

DR. GEORGE KARRAZ, Ph. D. 33


Gaussian Kernel Size i=5

DR. GEORGE KARRAZ, Ph. D. 34


Gaussian Kernel Size i=6

DR. GEORGE KARRAZ, Ph. D. 35


Gaussian Kernel Size i=7

DR. GEORGE KARRAZ, Ph. D. 36


Gaussian Kernel Size i=8

DR. GEORGE KARRAZ, Ph. D. 37


Gaussian Kernel Size i=9

DR. GEORGE KARRAZ, Ph. D. 38


Gaussian Kernel Size i=10

DR. GEORGE KARRAZ, Ph. D. 39


• Detect maxima and
minima of difference-
of-Gaussian in scale
space (the pyramid
idea)

DR. GEORGE KARRAZ, Ph. D. 40


Example of key points detection

(a) 233x189 image


(b) 832 DOG extrema
(c) 729 above threshold

Difference of Gaussian Pyramid DoG

DR. GEORGE KARRAZ, Ph. D. 41


Example of key points detection
Threshold on value at DOG peak and on ratio of principle curvatures
(Harris approach)

(c) 729 left after peak value threshold (from 832)


(d) 536 left after testing ratio of principle curvatures

DR. GEORGE KARRAZ, Ph. D. 42


Select canonical orientation
• Create histogram of local
gradient directions
computed at selected scale
• Assign canonical
orientation at peak of
smoothed histogram
• Each key specifies stable 2D
coordinates (x, y, scale,
orientation)

0 2
DR. GEORGE KARRAZ, Ph. D. 43
SIFT vector formation
• Thresholded image gradients are sampled over
16x16 array of locations in scale space
• Create array of orientation histograms
• 8 orientations x 4x4 histogram array = 128
dimensions

DR. GEORGE KARRAZ, Ph. D. 44


Nearest-neighbor matching to feature
database
• Ideal search: nearest neighbor (difficult in high-dim
spaces)
• Hypotheses are generated by approximate nearest
neighbor matching of each feature to vectors in the
database
– SIFT use best-bin-first (Beis & Lowe, 97) modification
to k-d tree algorithm
– Use heap data structure to identify bins in order by
their distance from query point

• Result: Can give speedup by factor of 1000 while finding


nearest neighbor (of interest) 95% of the time
45
DR. GEORGE KARRAZ, Ph. D.
• Extract outlines with
background
subtraction

DR. GEORGE KARRAZ, Ph. D. 46


3D Object Recognition
• Only 3 keys are needed
for recognition, so extra
keys provide robustness
• Affine model is no
longer as accurate

DR. GEORGE KARRAZ, Ph. D. 47


Recognition under occlusion

DR. GEORGE KARRAZ, Ph. D. 48


Test of illumination invariance
• Same image under differing illumination

273 keys verified in final match

49
DR. GEORGE KARRAZ, Ph. D.
THANK YOU!

NEXT: INTRODUCTION TO FACE DETECTION


& RECOGNITION

DR. GEORGE KARRAZ, Ph. D.

50
COMPUTER VISION
LECTURE VII
INTRODUCTION TO FACE
RECOGNITION & DETECTION
DR. GEORGE KARRAZ, Ph. D.
Outline
Face recognition •
Face recognition processing •
Analysis in face subspaces •
Technical challenges •
Technical solutions •

Face detection •
Appearance-based and learning based approaches •
Neural networks methods •
AdaBoost-based methods •
Dealing with head rotations •
Performance evaluation •

2 DR. GEORGE KARRAZ, Ph. D.


Face Recognition by Humans
• Performed routinely and effortlessly by humans
• Enormous interest in automatic processing of digital images and videos
due to wide availability of powerful and low-cost desktop embedded
computing
• Applications:
• biometric authentication,
• surveillance,
• human-computer interaction
• multimedia management

3 DR. GEORGE KARRAZ, Ph. D.


Face recognition
Advantages over other biometric technologies:
• Natural
• Non intrusive
• Easy to use

Among the six biometric attributes considered by Hietmeyer, facial


features scored the highest compatibility in a Machine Readable Travel
Documents (MRTD) system based on:
• Enrollment
• Renewal
• Machine requirements
• Public perception

4 DR. GEORGE KARRAZ, Ph. D.


Classification
A face recognition system is expected to identify faces present in images
and videos automatically. It can operate in either or both of two
modes:
Face verification (or authentication): involves a one-to-one match that
compares a query face image against a template face image whose identity is
being claimed.

Face identification (or recognition): involves one-to-many matches that


compares a query face image against all the template images in the database to
determine the identity of the query face.

First automatic face recognition system was developed by Kanade 1973.

5 DR. GEORGE KARRAZ, Ph. D.


Outline
• Face recognition
• Face recognition processing
• Analysis in face subspaces
• Technical challenges
• Technical solutions

• Face detection
• Appearance-based and learning based approaches
• Preprocessing
• Neural networks and kernel-based methods
• AdaBoost-based methods
• Dealing with head rotations
• Performance evaluation

6 DR. GEORGE KARRAZ, Ph. D.


Face recognition processing
Face recognition is a visual pattern recognition problem.
A face is a three-dimensional object subject to varying illumination, pose,
expression is to be identified based on its two-dimensional image ( or
three- dimensional images obtained).

A face recognition system generally consists of 4 modules - detection,


alignment, feature extraction, and matching.
Localization and normalization (face detection and alignment) are
processing steps before face recognition (facial feature extraction and
matching) is performed.

7 DR. GEORGE KARRAZ, Ph. D.


Face recognition processing
• Face detection segments the face areas from the background.
• In the case of video, the detected faces may need to be tracked
using a face tracking component.
• Face alignment is aimed at achieving more accurate localization
and at normalizing faces, whereas face detection provides coarse
estimates of the location and scale of each face.
• Facial components and facial outline are located; based on the
location points,
• The input face image is normalized in respect to geometrical
properties, such as size and pose, using geometrical transforms
or morphing,
• The face is further normalized with respect to photometrical
properties such as illumination and gray scale.

8 DR. GEORGE KARRAZ, Ph. D.


Face recognition processing

After a face is normalized, feature extraction is performed to


provide effective information that is useful for
distinguishing between faces of different persons and
stable with respect to the geometrical and photometrical
variations.

For face matching, the extracted feature vector of the input


face is matched against those of enrolled faces in the
database; it outputs the identity of the face when a match
is found with sufficient confidence or indicates an
unknown face otherwise.

9 DR. GEORGE KARRAZ, Ph. D.


Face recognition processing

Face recognition processing flow.

10 DR. GEORGE KARRAZ, Ph. D.


Outline
• Face recognition
• Face recognition processing
• Analysis in face subspaces
• Technical challenges
• Technical solutions

• Face detection
• Appearance-based and learning based approaches
• Preprocessing
• Neural networks and kernel-based methods
• AdaBoost-based methods
• Dealing with head rotations
• Performance evaluation

11 DR. GEORGE KARRAZ, Ph. D.


Analysis in face subspaces
Subspace analysis techniques for face recognition are based on the fact
that a class of patterns of interest, such as the face, resides in a subspace
of the input image space:

A small image of 64 × 64 having 4096 pixels can express a large number of pattern
classes, such as trees, houses and faces.

Among the 2564096 > 109864 possible “configurations”, only a few correspond to
faces. Therefore, the original image representation is highly redundant, and the
dimensionality of this representation could be greatly reduced .

12 DR. GEORGE KARRAZ, Ph. D.


Analysis in face subspaces

With the eigenface or PCA approach, a small number (40 or lower) of


eigenfaces are derived from a set of training face images by using the
Karhunen-Loeve transform or PCA.

A face image is efficiently represented as a feature vector (i.e. a vector of


weights) of low dimensionality.

The features in such subspace provide more salient and richer information for
recognition than the raw image.

13 DR. GEORGE KARRAZ, Ph. D.


Analysis in face subspaces
The manifold (i.e. distribution) of all faces accounts for variation in face
appearance whereas the nonface manifold (distribution) accounts for everything else.

If we look into facial manifolds in the image space, we find them highly
nonlinear and nonconvex.

The figure (a) illustrates face versus nonface manifolds and (b) illustrates the
manifolds of two individuals in the entire face manifold.

Face detection is a task of distinguishing between the face and nonface manifolds
in the image (sub window) space and face recognition between those of
individuals in the face manifolds.

(a) Face versus nonface manifolds. (b) Face manifolds of different individuals.
14
DR. GEORGE KARRAZ, Ph. D.
Handwritten manifolds
• Two dimensional embedding of handwritten digits ("0"-"9") by Laplacian
Eigenmap, Locally Preserving Projection, and PCA
• Colors correspond to the same individual handwriting

15 DR. GEORGE KARRAZ, Ph. D.


Examples
• The Eigenfaces, Fisher faces and Laplacian faces calculated from the face
images in the Yale database.

Eigenfaces

Fisherfaces

Laplacianfaces

16 DR. GEORGE KARRAZ, Ph. D.


Outline
• Face recognition
• Face recognition processing
• Analysis in face subspaces
• Technical challenges
• Technical solutions

• Face detection
• Appearance-based and learning based approaches
• Neural networks methods
• AdaBoost-based methods
• Dealing with head rotations
• Performance evaluation

17 DR. GEORGE KARRAZ, Ph. D.


Technical Challenges
The performance of many state-of-the-art face recognition methods
deteriorates with changes in lighting, pose and other factors. The key
technical challenges are:
• Large Variability in Facial Appearance: Whereas shape and reflectance are
intrinsic properties of a face object, the appearance (i.e. texture) is subject
to several other factors, including the facial pose, illumination, facial
expression.

Intrasubject variations in pose, illumination, expression, occlusion,


accessories (e.g. glasses), color and brightness.

18 DR. GEORGE KARRAZ, Ph. D.


Technical Challenges
• Highly Complex Nonlinear Manifolds: The entire face manifold (distribution) is highly
nonconvex and so is the face manifold of any individual under various changes. Linear
methods such as PCA, independent component analysis (ICA) and linear discriminant
analysis (LDA) project the data linearly from a high-dimensional space (e.g. the image
space) to a low-dimensional subspace. As such, they are unable to preserve the
nonconvex variations of face manifolds necessary to differentiate among individuals.
• In a linear subspace, Euclidean distance and Mahalanobis distance do not perform well
for classifying between face and nonface manifolds and between manifolds of
individuals. This limits the power of the linear methods to achieve highly accurate face
detection and recognition.

19 DR. GEORGE KARRAZ, Ph. D.


Technical Challenges
• High Dimensionality and Small Sample Size: Another challenge is the ability to
generalize as illustrated in figure. A canonical face image of 112 × 92 resides in a
10,304-dimensional feature space. Nevertheless, the number of examples per
person (typically fewer than 10) available for learning the manifold is usually
much smaller than the dimensionality of the image space; a system trained on so
few examples may not generalize well to unseen instances of the face.

20 DR. GEORGE KARRAZ, Ph. D.


Outline
• Face recognition
• Face recognition processing
• Analysis in face subspaces
• Technical challenges
• Technical solutions:

• Statistical (learning-based)
• Geometry-based and appearance-based
• Non-linear kernel techniques
• Taxonomy

• Face detection
• Appearance-based and learning-based approaches
• Non-linear and Neural networks methods
• AdaBoost-based methods
• Dealing with head rotations
• Performance evaluation

21 DR. GEORGE KARRAZ, Ph. D.


Technical Solutions
• Feature extraction: construct a “good” feature space in which the face
manifolds become simpler i.e. less nonlinear and nonconvex than those in
the other spaces. This includes two levels of processing:

Normalize face images geometrically and photometrically, such as using


morphing and histogram equalization
Extract features in the normalized images which are stable with respect to such
variations, such as based on Gabor wavelets.

• Pattern classification: construct classification engines able to solve difficult


nonlinear classification and regression problems in the feature space and
to generalize better.

22 DR. GEORGE KARRAZ, Ph. D.


Technical Solutions
Learning-based approach - statistical learning
• Learns from training data to extract good features and construct classification
engines.

• During the learning, both prior knowledge about face(s) and variations seen in
the training data are taken into consideration.

• The appearance-based approach such as PCA and LDA based methods, has
significantly advanced face recognition techniques.

• They operate directly on an image-based representation (i.e. an array of pixel


intensities) and extracts features in a subspace derived from training images.

23 DR. GEORGE KARRAZ, Ph. D.


Technical Solutions
Appearance-based approach utilizing
geometric features
Detects facial features such as eyes, nose, mouth and chin.
- Detects properties of and relations (e.g. areas, distances, angles)
between the features are used as descriptors for face recognition.

Advantages:

• economy and efficiency when achieving data reduction and insensitivity


to variations in illumination and viewpoint
• facial feature detection and measurement techniques are not reliable
enough is they are based on the geometric feature based recognition
only
• rich information contained in the facial texture or appearance is still
utilized in appearance-based approach.

24 DR. GEORGE KARRAZ, Ph. D.


Technical Solutions
Nonlinear kernel techniques
Linear methods can be extended using nonlinear
kernel techniques (kernel PCA and kernel LDA) to deal
with nonlinearly in face recognition.

• A non-linear projection (dimension reduction) from the image space to


a feature space is performed; the manifolds in the resulting feature
space become simple, yet with subtleties preserved.

• A local appearance-based feature space uses appropriate image filters,


so the distributions of faces are less affected by various changes.
Examples:
• Local feature analysis (LFA)
• Gabor wavelet-based features such as elastic graph bunch matching (EGBM)
• Local binary pattern (LBP)

25 DR. GEORGE KARRAZ, Ph. D.


Taxonomy of face recognition algorithms

Taxonomy of face recognition algorithms based on pose-dependency,


face representation, and features used in matching.
26 DR. GEORGE KARRAZ, Ph. D.
Outline
• Face recognition
• Face recognition processing
• Analysis in face subspaces
• Technical challenges
• Technical solutions

• Face detection
• Appearance-based and learning based approaches
• Preprocessing
• Neural networks and kernel-based methods
• AdaBoost-based methods
• Dealing with head rotations
• Performance evaluation

27 DR. GEORGE KARRAZ, Ph. D.


Face detection
Face detection is the first step in automated face recognition.

Face detection can be performed based on several cues:


• skin color
• motion
• facial/head shape
• facial appearance or
• a combination of these parameters.
Most successful face detection algorithms are appearance-based
without using other cues.

28 DR. GEORGE KARRAZ, Ph. D.


Face detection

The processing is done as follows:


• An input image is scanned at al possible locations and scales by a
subwindow.
• Face detection is posed as classifying the pattern in the subwindow as
either face or nonface.
• The face/nonface classifier is learned from face and nonface training
examples using statistical learning methods

• Note: The ability to deal with nonfrontal faces is important for many real
applications because approximately 75% of the faces in home photos are
nonfrontal.

29 DR. GEORGE KARRAZ, Ph. D.


Appearance-based and learning based approaches
• Face detection is treated as a problem of classifying each scanned
sub window as one of two classes (i.e. face and nonface).

• Appearance-based methods avoid difficulties in modeling 3D


structures of faces by considering possible face appearances
under various conditions.

• A face/nonface classifier may be learned from a training set


composed of face examples taken under possible conditions as
would be seen in the running stage and nonface examples as well.

• Disadvantage: large variations brought about by changes in facial


appearance, lighting and expression make the face manifold or
face/non-face boundaries highly complex.

30 DR. GEORGE KARRAZ, Ph. D.


Appearance-based and learning based approaches
• Principal component analysis (PCA) or eigenface representation is
created by Turk and Pentland; only likelihood in the PCA subspace is
considered.

• Moghaddam and Pentland consider the likelihood in the orthogonal


complement subspace modeling the product of the two likelihood
estimates.

• Schneiderman and Kanade use multiresolution information for different


levels of wavelet transform.

• A nonlinear face and nonface classifier is constructed using statistics of


products of histograms computed from face and nonface examples
using AdaBoost learning. Viola and Jones built a fast, robust face
detection system in which AdaBoost learning is used to construct
nonlinear classifier.

31 DR. GEORGE KARRAZ, Ph. D.


Appearance-based and learning based approaches

• Liu presents a Bayesian Discriminating Features (BDF) method. The input image,
its one-dimensional Harr wavelet representation, and its amplitude projections
are concatenated into an expanded vector input of 768 dimensions. Assuming
that these vectors follow a (single) multivariate normal distribution for face,
linear dimension reduction is performed to obtain the PCA modes.
• Li et al. present a multi view face detection system. A new boosting algorithm,
called Float Boost, is proposed to incorporate Floating Search into AdaBoost. The
backtrack mechanism in the algorithm allows deletions of weak classifiers that
are ineffective in terms of error rate, leading to a strong classifier consisting of
only a small number of weak classifiers.
• Lienhart et al. use an extended set of rotated Haar features for dealing with in-
plane rotation and train a face detector using Gentle Adaboost with trees as base
classifiers. The results show that this combination outperforms that of Discrete
Adaboost.

32 DR. GEORGE KARRAZ, Ph. D.


Neural Networks and Kernel Based Methods
Nonlinear classification for face detection may be performed using neural
networks or kernel-based methods.

Neural methods: a classifier may be trained directly using preprocessed


and normalized face and nonface training subwindows.
• The input to the system of Sung and Poggio is derived from the six face and
six nonface clusters. More specifically, it is a vector of 2 × 6 = 12 distances in
the PCA subspaces and 2 × 6 = 12 distances from the PCA subspaces.
• The 24 dimensional feature vector provides a good representation for
classifying face and nonface patterns.
• In both systems, the neural networks are trained by back-propagation
algorithms.

Kernel SVM classifiers perform nonlinear classification for face detection


using face and nonface examples.
• Although such methods are able to learn nonlinear boundaries, a large
number of support vectors may be needed to capture a highly nonlinear
boundary. For this reason, fast realtime performance has so far been a
difficulty with SVM classifiers thus trained.
33 DR. GEORGE KARRAZ, Ph. D.
AdaBoost-based Methods

34 DR. GEORGE KARRAZ, Ph. D.


AdaBoost-based Methods
The AdaBoost learning procedure is aimed at learning a sequence of best
weak classifiers hm(x) and the best combining weights αm.

A set of N labeled training examples {(x1, y1), …, (xN, yN)} is assumed n


available, where yi Є {+1, -1} is the class label for the example xi Є R . A
distribution [w1, …, wN] of the training examples, where wi is associated
with a training example (xi, yi), is computed and updated during the
learning to represent the distribution of the training examples.

After iteration m, harder-to-classify examples (xi, yi) are given larger


weights wi(m), so that at iteration m + 1, more emphasis is placed on
these examples.

AdaBoost assumes that a procedure is available for learning a weak


classifier hm(x) from the training examples, given the distribution [wi(m)].

35 DR. GEORGE KARRAZ, Ph. D.


AdaBoost-based Methods
Haar-like features
Viola and Jones propose four basic types of scalar features for face detection as shown in
figure. Such a block feature is located in a subregion of a subwindow and varies in shape
(aspect ratio), size and location inside the subwindow.

For a subwindow of size 20 × 20, there can be tens of thousands of such features for varying
shapes, sizes and locations. Feature k, taking a scalar value zk(x) Є R, can be considered a
transform from the n-dimensional space to the real line. These scalar numbers form an
overcomplete feature set for the intrinsically low- dimensional face pattern.

Recently, extended sets of such features have been proposed for dealing with out-of-plan
head rotation and for in-plane head rotation.

These Haar-like features are interesting for two reasons:


powerful face/nonface classifiers can be constructed based on these features
they can be computed efficiently using the summed-area table or integral image
technique.

Four types of rectangular Haar wavelet-like


features. A feature is a scalar calculated by
summing up the pixels in the white region and
subtracting those in the dark region.

36 DR. GEORGE KARRAZ, Ph. D.


AdaBoost-based Methods
Constructing weak classifiers
The AdaBoost learning procedure is aimed at learning a sequence of best
weak classifiers to combine hm(x) and the combining weights αm. It
solves the following three fundamental problems:

Learning effective features from a large feature set

Constructing weak classifiers, each of which is based on one of the


selected features

Boosting the weak classifiers to construct a strong classifier

37 DR. GEORGE KARRAZ, Ph. D.


AdaBoost-based Methods
Constructing weak classifiers (cont’d)
AdaBoost assumes that a “weak learner” procedure is available.

The task of the procedure is to select the most significant feature from a set of
candidate features, given the current strong classifier learned thus far, and then
construct the best weak classifier and combine it into the existing strong
classifier.

In the case of discrete AdaBoost, the simplest type of weak classifiers is a “stump”.
A stump is a single-node decision tree. When the feature is real-valued, a stump
may be constructed by thresholding the value of the selected feature at a certain
threshold value; when the feature is discrete-valued, it may be obtained
according to the discrete label of the feature.

A more general decision tree (with more than one node) composed of several
stumps leads to a more sophisticated weak classifier.

38 DR. GEORGE KARRAZ, Ph. D.


AdaBoost-based Methods
Boosted strong classifier
• AdaBoost learns a sequence of weak classifiers hm and boosts them into a
strong one HM effectively by minimizing the upper bound on classification
error achieved by HM. The bound can be derived as the following
exponential loss function:

where i is the index for training examples.

39 DR. GEORGE KARRAZ, Ph. D.


AdaBoost learning algorithm

AdaBoost learning algorithm

40 DR. GEORGE KARRAZ, Ph. D.


AdaBoost-based Methods
FloatBoost Learning
AdaBoost attempts to boost the accuracy of an ensemble of weak classifiers. The
AdaBoost algorithm solves many of the practical difficulties of earlier boosting
algorithms. Each weak classifier is trained stage-wise to minimize the empirical error
for a given distribution reweighted according to the classification errors of the
previously trained classifiers. It is shown that AdaBoost is a sequential forward search
procedure using the greedy selection strategy to minimize a certain margin on the
training set.

A crucial heuristic assumption used in such a sequential forward search procedure is the
monotonicity (i.e. that addition of a new weak classifier to the current set does not
decrease the value of the performance criterion). The premise offered by the
sequential procedure in AdaBoost breaks down when this assumption is violated.

Floating Search is a sequential feature selection procedure with backtracking, aimed to


deal with nonmonotonic criterion functions for feature selection. A straight sequential
selection method such as sequential forward search or sequential backward search
adds or deletes one feature at a time. To make this work well, the monotonicity
property has to be satisfied by the performance criterion function. Feature selection
with a nonmonotonic criterion may be dealt with using a more sophisticated
technique, called plus-L-minus-r, which adds or deletes L features and then backtracks
r steps.

41 DR. GEORGE KARRAZ, Ph. D.


DR. GEORGE KARRAZ, Ph. D. FloatBoost
algorithm

The Float Boost Learning procedure is


composed of several parts:
• the training input,
• initialization,
• forward inclusion,
• conditional exclusion and
• output.
In forward inclusion, the currently most
significant weak classifiers are added one
at a time, which is the same as in
AdaBoost.
In conditional exclusion, Float Boost
removes the least significant weak
classifier from the set HM of current weak
classifiers, subject to the condition that
the
min
removal leads to a lower cost than
J M-1. Supposing that the weak classifier
removed was the m’-th in HM, then
hm’,…,hM-1 and the αm’s must be
relearned. These steps are repeated until
no more removals can be done.

42 FloatBoost algorithm
AdaBoost-based Methods
Cascade of Strong Classifiers: A boosted strong classifier effectively •
eliminates a large portion of nonface subwindows while
maintaining a high detection rate. Nonetheless, a single strong
classifier may not meet the requirement of an extremely low false
alarm rate (e.g. 10-6 or even lower). A solution is to arbitrate
between several detectors (strong classifier), for example, using
the “AND” operation.

A cascade of n strong classifiers (SC). The input is a subwindow x. It is sent to


the next SC for further classification only if it has passed all the previous SCs
as the face (F) pattern; otherwise it exists as nonface (N). x is finally
considered to be a face when it passes all the n SCs.

43 DR. GEORGE KARRAZ, Ph. D.


Outline
• Face recognition
• Face recognition processing
• Analysis in face subspaces
• Technical challenges
• Technical solutions

• Face detection
• Appearance-based and learning based approaches
• Neural networks and kernel-based methods
• AdaBoost-based methods
• Dealing with head rotations
• Performance evaluation

44 DR. GEORGE KARRAZ, Ph. D.


Dealing with Head Rotations
Multiview face detection should be able to detect non frontal faces. There
are three types of head rotation:

out-of-plane rotation (look to the left – to the right)


in-plane rotation (tilted toward shoulders)
up-and-down nodding rotation (up-down)

Adopting a coarse-to-fine view-partition strategy, the detector-pyramid


architecture consists of several levels from the coarse top level to the fine
Bottom level.

Rowley et al. propose to use two neural network classifiers for detection of
frontal faces subject to in-plane rotation.
• The first is the router network, trained to estimate the orientation of an
assumed face in the sub window, though the window may contain a nonface
pattern. The inputs to the network are the intensity values in a preprocessed 20
× 20 sub window. The angle of rotation is represented by an array of 36 output
units, in which each unit represents an angular range.
• The second neural network is a normal frontal, upright face detector.

45 DR. GEORGE KARRAZ, Ph. D.


Dealing with Head Rotations
Coarse-to-fine: The partitions of the out-of-plane rotation for the three-
level detector-pyramid is illustrated in figure.

Out-of-plane view partition. Out-of-plane head rotation (row 1), the


facial view labels (row 2), and the coarse-to-fine view partitions at the
three levels of the detector-pyramid (rows 3 to 5).

46 DR. GEORGE KARRAZ, Ph. D.


Dealing with Head Rotations
Simple-to-complex: A large number of sub windows result from
the scan of the input image. For example, there can be tens to
hundreds of thousands of them for an image of size 320 × 240, the
actual number depending on how the image is scanned.

Merging from different channels. From left to right: Outputs of frontal, left and
right view channels and the final result after the merge.

47 DR. GEORGE KARRAZ, Ph. D.


Outline
• Face recognition
• Face recognition processing
• Analysis in face subspaces
• Technical challenges
• Technical solutions

• Face detection
• Appearance-based and learning based approaches
• Neural networks and kernel-based methods
• AdaBoost-based methods
• Dealing with head rotations
• Performance evaluation

48 DR. GEORGE KARRAZ, Ph. D.


Performance Evaluation
The result of face detection from an image is affected
by the two basic components:
• The face/nonface classifier: consists of face icons of a fixed
size (as are used for training). This process aims to evaluate
the performance of the face/nonface classifier
(preprocessing included), without being affected by
merging.

• The postprocessing (merger): consists of normal images. In


this case, the face detection results are affected by both
trained classifier and merging; the overall system
performance is evaluated.

49 DR. GEORGE KARRAZ, Ph. D.


Performance Measures
• The face detection performance is primarily measured by two rates: the
correct detection rate (which is 1 minus the miss detection rate) and
the false alarm rate.

• As AdaBoost-based methods (with local Haar wavelet features) have so


far provided the best face detection solutions in terms of the statistical
rates and the speed
• There are a number of variants of boosting algorithms: DAB- discrete
Adaboost; RAB- real Adaboost; and GAB- gentle Adaboost, with
different training sets and weak classifiers.
• Three 20-stage cascade classifiers were trained with DAB, RAB and GAB
using the Haar-like feature set of Viola and Jones and stumps as the
weak classifiers. It is reported that GAB outperformed the other two
boosting algorithms; for instance, at an absolute false alarm rate of 10
on the CMU test set, RAB detected only 75.4% and DAB only 79.5% of
all frontal faces, and GAB achieved 82.7% at a rescale factor of 1.1.

50 DR. GEORGE KARRAZ, Ph. D.


Performance Measures
• Two face detection systems were trained: one with the basic Haar-like
feature set of Viola and Jones and one with the extended Haar-like
feature set in which rotated versions of the basic Haar features are
added.

• On average, the false alarm rate was about 10% lower for the extended
haar-like feature set at comparable hit rates.

• At the same time, the computational complexity was comparable.

• This suggests that whereas the larger haar-like feature set makes it
more complex in both time and memory in the boosting learning phase,
gain is obtained in the detection phase.

51 DR. GEORGE KARRAZ, Ph. D.


Performance Measures
Regarding the AdaBoost approach, the following conclusions can be drawn:

• An over-complete set of Haar-like features are effective for face detection. The use of the
integral image method makes the computation of these features efficient and achieves scale
invariance. Extended Haar-like features help detect nonfrontal faces.
• Adaboost learning can select best subset from a large feature set and construct a powerful
nonlinear classifier.
• The cascade structure significantly improves the detection speed and effectively reduces false
alarms, with a little sacrifice of the detection rate.
• Float Boost effectively improves boosting learning result. It results in a classifier that needs
fewer weaker classifiers than the one obtained using AdaBoost to achieve a similar error rate, or
achieve a lower error rate with the same number of weak classifiers. This run time improvement
is obtained at the cost of longer training time.
• Less aggressive versions of Adaboost, such as Gentle Boost and Logit Boost may be preferable to
discrete and real Adaboost in dealing with training data containing outliers (distinct, unusual
cases).
• More complex weak classifiers (such as small trees) can model second-order and/or third-order
dependencies, and may be beneficial for the nonlinear task of face detection.

52 DR. GEORGE KARRAZ, Ph. D.


THANK YOU!
NEXT: V IOLA JONES FACE DETECTOR

DR. GEORGE KARRAZ, Ph. D.

53
COMPUTER VISION
LECTURE VIII
VIOLA JONES FACE & DETECTOR

DR. GEORGE KARRAZ, Ph. D.


The Viola/Jones Face Detector
(2001)

➢ A widely used method for real-time object detection.


➢ Training is slow, but detection is very fast.

DR. GEORGE KARRAZ, Ph. D.


Classifier is Learned from Labeled Data

• Training Data
– 5000 faces
• All frontal
– 300 million non faces
• 9400 non-face images
– Faces are normalized
• Scale, translation
• Many variations
– Across individuals
– Illumination
– Pose (rotation both in plane and out)
3 DR. GEORGE KARRAZ, Ph. D.
Key Properties of Face Detection
• Each image contains 10 - 50 thousand locs/scales
• Faces are rare 0 - 50 per image
– 1000 times as many non-faces as faces
• Extremely small # of false positives: 10-6

4 DR. GEORGE KARRAZ, Ph. D.


AdaBoost
• Given a set of weak classifiers
originally : h j (x) {+1, − 1}
– None much better than random
• Iteratively combine classifiers
– Form a linear combination
 
C ( x ) =    ht ( x ) + b 
 t 
– Training error converges to 0 quickly
– Test error is related to training margin

5 DR. GEORGE KARRAZ, Ph. D.


Weak
Classifier 1
AdaBoost
Freund & Shapire
Weights
Increased

Weak
Classifier 2

Weak
classifier 3

Final classifier is
linear combination of
weak classifiers
6 DR. GEORGE KARRAZ, Ph. D.
AdaBoost:
Super Efficient Feature Selector

• Features = Weak Classifiers


• Each round selects the optimal feature
given:
– Previous selected features
– Exponential Loss

7 DR. GEORGE KARRAZ, Ph. D.


Boosted Face Detection: Image Features

“Rectangle filters”

Similar to Haar wavelets


Papageorgiou, et al.

t if f t ( xi )   t
ht ( xi ) = 
 t otherwise

 
C ( x ) =    ht ( x ) + b 
 t  60,000 features to choose from

8 DR. GEORGE KARRAZ, Ph. D.


The Integral Image

• The integral image


computes a value at each
pixel (x,y) that is the sum
of the pixel values above (x,y)
and to the left of (x,y),
inclusive.
• This can quickly be
computed in one pass
through the image

9 DR. GEORGE KARRAZ, Ph. D.


Computing Sum within a Rectangle

• Let A,B,C,D be the values of


the integral image at the
D B
corners of a rectangle
• Then the sum of original
image values within the
rectangle can be computed: C A
sum = A – B – C + D
• Only 3 additions are required
for any size of rectangle!
– This is now used in many areas
of computer vision

10 DR. GEORGE KARRAZ, Ph. D.


Feature Selection

• For each round of boosting:


– Evaluate each rectangle filter on each example
– Sort examples by filter values
– Select best threshold for each filter (min Z)
– Select best filter/threshold (= Feature)
– Reweight examples
• M filters, T thresholds, N examples, L learning time
– O( MT L(MTN) ) Naïve Wrapper Method
– O( MN ) Adaboost feature selector

11 DR. GEORGE KARRAZ, Ph. D.


Example Classifier for Face Detection

A classifier with 200 rectangle features was learned using AdaBoost

95% correct detection on test set with 1 in 14084


false positives.

Not quite competitive...

ROC curve for 200 feature classifier


12 DR. GEORGE KARRAZ, Ph. D.
DR. GEORGE KARRAZ, Ph. D.
Building Fast Classifiers

• Given a nested set of classifier


% False Pos
hypothesis classes 0 50

100
vs false neg determined by

% Detection

50
• Computational Risk Minimization
T T T
IMAGE Classifier 2 Classifier 3
SUB-WINDOW
Classifier 1 FACE
F F F

NON-FACE NON-FACE NON-FACE


13
Cascaded Classifier

50% 20% 2%
IMAGE 1 Feature 5 Features 20 Features
SUB-WINDOW FACE
F F F

NON-FACE NON-FACE NON-FACE

• A 1 feature classifier achieves 100% detection rate


and about 50% false positive rate.
• A 5 feature classifier achieves 100% detection rate
and 40% false positive rate (20% cumulative)
– using data from previous stage.
• A 20 feature classifier achieve 100% detection
rate with 10% false positive rate (2% cumulative)

14 DR. GEORGE KARRAZ, Ph. D.


DR. GEORGE KARRAZ, Ph. D.

Output of Face Detector on Test Images

15
DR. GEORGE KARRAZ, Ph. D.
Solving other “Face” Tasks

Facial Feature Localization


Profile Detection

Demographic
Analysis

16
Feature Localization Features DR. GEORGE KARRAZ, Ph. D.

• Learned features reflect the task

17
DR. GEORGE KARRAZ, Ph. D.
Profile Detection

18
Profile Features

19 DR. GEORGE KARRAZ, Ph. D.


Review: Classifiers

• Bayes risk, loss functions


• Histogram-based classifiers
• Kernel density estimation
• Nearest-neighbor classifiers
• Neural networks

Viola/Jones face detector


• Integral image
• Cascaded classifier

20 DR. GEORGE KARRAZ, Ph. D.


THANK YOU!
NEXT: G EOMETRIC TRANSORMATIONS

DR. GEORGE KARRAZ, Ph. D.

21
COMPUTER VISION
LECTURE IX
GEOMETRIC TRANSFORMATIONS

DR. GEORGE KARRAZ, Ph. D.


Geometric transformations

Review some basics of linear algebra and


geometric transformations

DR. GEORGE KARRAZ, Ph. D. 2


Outline

• Representation
• Basics of linear algebra
• Homogeneous Coordinates
• Geometrical transformations

DR. GEORGE KARRAZ, Ph. D. 3


Representation
• Digital Pictures are 2D arrays (matrices) of numbers
• Each pixel is a measure of the brightness (intensity of light)
– that falls on an area of an sensor (typically a CCD chip)

DR. GEORGE KARRAZ, Ph. D. 4


Picture as a Vector in Dimension N
XN

Vector of
dimension N

256
Appearance
X1

DR. GEORGE KARRAZ, Ph. D. 5


Vectors in
• We can think of vectors as points in a multidimensional space with
respect to some coordinate system
• Ordered set of numbers
• Example in two dimensions

DR. GEORGE KARRAZ, Ph. D. 6


Vectors in

• Notation:

DR. GEORGE KARRAZ, Ph. D. 7


Scalar Product

• A product of two vectors


• Amounts to projection of one vector onto the other
• Example in 2D:

The shown segment has length <x, y>, if x and y are unit vectors.

DR. GEORGE KARRAZ, Ph. D. 8


Scalar Product

• Various notations:

• Other names: dot product, inner product

DR. GEORGE KARRAZ, Ph. D. 9


Scalar Product in

• Definition:

• In terms of angles:

• Other properties: commutative, associative, distributive

DR. GEORGE KARRAZ, Ph. D. 10


Basis
• A basis is a linearly independent set of vectors that spans the “whole
space”. I. e., we can write every vector in our space as linear
combination of vectors in that set.
• Every set of n linearly independent vectors in is a basis of
• Orthogonality: Two non-zero vectors x and y are orthogonal if x.y = 0

• A basis is called
– orthogonal, if every basis vector is orthogonal to all other basis
vectors
– orthonormal, if additionally all basis vectors have length 1.

DR. GEORGE KARRAZ, Ph. D. 11


Bases

• Orthonormal basis:

DR. GEORGE KARRAZ, Ph. D. 12


Overview
2D Transformations
• Basic 2D transformations
• Matrix representation
• Matrix composition
3D Transformations
• Basic 3D transformations
• Same as 2D
2D Modeling Transformations
Modeling
Coordinates
Scale
y Translate

Scale
Rotate
Translate

World Coordinates
2D Modeling Transformations
Modeling
Coordinates
y

Let’s look
at this in
detail…

World Coordinates
2D Modeling Transformations
Modeling
Coordinates
y

Initial location
at (0, 0) with
x- and y-axes
aligned
2D Modeling Transformations
Modeling
Coordinates
y

Scale .3, .3
Rotate -90
Translate 5, 3
2D Modeling Transformations
Modeling
Coordinates
y

Scale .3, .3
Rotate -90
Translate 5, 3
2D Modeling Transformations
Modeling
Coordinates
y

Scale .3, .3
Rotate -90
Translate 5, 3

World Coordinates
Scaling
Scaling
Scaling a coordinate means multiplying each of its
components by a scalar
Uniform scaling means this scalar is the same for
all components:

2
Scaling
Non-uniform scaling: different scalars per
component:

X  2,
Y  0.5

How can we represent this in matrix form?


Scaling

Scaling operation:  x' ax 


 y ' = by 
   

Or, in matrix form:  x' = a 0  x 


 y ' 0 b  y 
    
scaling matrix
2-D Rotation
(x’, y’)

(x, y)


2-D Rotation
x = r cos (f)
y = r sin (f)
x’ = r cos (f + )
y’ = r sin (f + )
(x’, y’)
Trig Identity…
(x, y) x’ = r cos(f) cos() – r sin(f) sin()
y’ = r sin(f) sin() + r cos(f) cos()

 f Substitute…
x’ = x cos() - y sin()
y’ = x sin() + y cos()
Geometric Transformations
Rotation Equations:

26
2-D Rotation
This is easy to capture in matrix form:

 x' cos( ) − sin ( )  x 


 y ' =  sin ( ) cos( )   y 
    

Even though sin() and cos() are nonlinear


functions of ,
• x’ is a linear combination of x and y
• y’ is a linear combination of x and y
Geometric Transformations
2D Translation:

28
Geometric Transformations
2D Translation Equation:
Basic 2D Transformations
Translation:
• x’ = x + tx
• y’ = y + ty
Scale:
• x’ = x * sx
• y’ = y * sy
Shear:
Transformations
• x’ = x + hx*y can be combined
• y’ = y + hy*x (with simple algebra)
Rotation:
• x’ = x*cosQ - y*sinQ
• y’ = x*sinQ + y*cosQ
Basic 2D Transformations
Translation:
• x’ = x + tx
• y’ = y + ty
Scale:
• x’ = x * sx
• y’ = y * sy
Shear:
• x’ = x + hx*y
• y’ = y + hy*x
Rotation:
• x’ = x*cosQ - y*sinQ
• y’ = x*sinQ + y*cosQ
Basic 2D Transformations
Translation:
• x’ = x + tx
• y’ = y + ty
Scale: (x,y)
• x’ = x * sx (x’,y’)
• y’ = y * sy
Shear:
• x’ = x + hx*y x’ = x*sx
• y’ = y + hy*x y’ = y*sy
Rotation:
• x’ = x*cosQ - y*sinQ
• y’ = x*sinQ + y*cosQ
Basic 2D Transformations
Translation:
• x’ = x + tx
• y’ = y + ty
Scale:
• x’ = x * sx
• y’ = y * sy
Shear:
(x’,y’)
• x’ = x + hx*y
x’ = (x*sx)*cosQ - (y*sy)*sinQ
• y’ = y + hy*x
y’ = (x*sx)*sinQ + (y*sy)*cosQ
Rotation:
• x’ = x*cosQ - y*sinQ
• y’ = x*sinQ + y*cosQ
Basic 2D Transformations
Translation:
• x’ = x + tx
• y’ = y + ty
Scale:
(x’,y’)
• x’ = x * sx
• y’ = y * sy
Shear:
• x’ = x + hx*y x’ = ((x*sx)*cosQ - (y*sy)*sinQ) + tx
• y’ = y + hy*x y’ = ((x*sx)*sinQ + (y*sy)*cosQ) + ty
Rotation:
• x’ = x*cosQ - y*sinQ
• y’ = x*sinQ + y*cosQ
Basic 2D Transformations
Translation:
• x’ = x + tx
• y’ = y + ty
Scale:
• x’ = x * sx
• y’ = y * sy
Shear:
• x’ = x + hx*y x’ = ((x*sx)*cosQ - (y*sy)*sinQ) + tx
• y’ = y + hy*x y’ = ((x*sx)*sinQ + (y*sy)*cosQ) + ty
Rotation:
• x’ = x*cosQ - y*sinQ
• y’ = x*sinQ + y*cosQ
Outline
2D Transformations
• Basic 2D transformations
• Matrix representation
• Matrix composition
3D Transformations
• Basic 3D transformations
• Same as 2D
Matrix Representation
Represent 2D transformation by a matrix

a b 
 c d 

Multiply matrix by column vector


 apply transformation to point
 x' = a b   x  x ' = ax + by
 y '  c d   y  y ' = cx + dy
Matrix Representation
Transformations combined by multiplication

 x'  = a b   e f  i j x
 y '  c d   g h  k l   y 

Matrices are a convenient and efficient way


to represent a sequence of transformations!
2x2 Matrices
What types of transformations can be
represented with a 2x2 matrix?
2D Identity?
x' = x  x '  = 1 0   x 
y' = y  y ' 0 1  y 

2D Scale around (0,0)?


x' = s x * x  x '  s x 0   x 
y' = s y * y  y' =  0 s   y 
   y  
2x2 Matrices
What types of transformations can be
represented with a 2x2 matrix?
2D Rotate around (0,0)?
x' = cos Q * x − sin Q * y  x ' cos Q − sin Q  x 
y' = sin Q * x + cos Q * y  y ' =  sin Q cos Q   y 
    

2D Shear?
x ' = x + shx * y  x '  1 shx   x 
y ' = shy * x + y  y' =  sh  
   y 1  y
2x2 Matrices
What types of transformations can be
represented with a 2x2 matrix?
2D Mirror about Y axis?
x' = − x  x '  = − 1 0  x 
y' = y  y '  0 1  y 

2D Mirror over (0,0)?


x' = − x  x' = − 1 0   x 
y' = − y  y '  0 − 1  y 
2x2 Matrices
What types of transformations can be
represented with a 2x2 matrix?
2D Translation?
x' = x + t x
NO!
y' = y + t y

Only linear 2D transformations


can be represented with a 2x2 matrix
Linear Transformations
Linear transformations are combinations of …
• Scale,
 x' a b   x 
 y' =  c d   y 
• Rotation,
• Shear, and     
• Mirror
Properties of linear transformations:
• Satisfies: T ( s1p1 + s2p 2 ) = s1T (p1 ) + s2T (p 2 )
• Origin maps to origin
• Lines map to lines
• Parallel lines remain parallel
• Ratios are preserved
• Closed under composition
Homogeneous Coordinates
Q: How can we represent translation as a 3x3
matrix?
x' = x + t x
y' = y + t y
Homogeneous Coordinates
Homogeneous
coordinates  x
 x  homogeneous coords  
• represent coordinates in 2  y  ⎯⎯ ⎯ ⎯ ⎯⎯→ y 
dimensions with a 3-    1 
vector

Homogeneous coordinates seem unintuitive,


but they make graphics operations much
easier
Homogeneous Coordinates
Q: How can we represent translation as a 3x3
matrix? x ' = x + t
x

y' = y + t y

A: Using the rightmost column:


1 0 t x 
 
Translation = 0 1 t y 
0 0 1 
 
Homogeneous Coordinates
Homogeneous Coordinates
Homogeneous Coordinates
Homogeneous Coordinates ➔ Back to Cartesian
Coordinates
2D Translation using Homogeneous Coordinates
2D Translation using Homogeneous Coordinates
Translation
Homogeneous Coordinates
Example of translation
 x ' 1 0 t x   x   x + t x 
 y ' = 0 1 t   y  =  y + t 
   y    y

 1  0 0 1   1   1 

tx = 2
ty = 1
Scaling Equation
Homogeneous Coordinates
Add a 3rd coordinate to every 2D point
• (x, y, w) represents a point at location (x/w, y/w)
• (x, y, 0) represents a point at infinity
• (0, 0, 0) is not allowed y
2
(2,1,1) or (4,2,2) or (6,3,3)
1

Convenient 1 2 x
coordinate system to
represent many
useful
transformations
Basic 2D Transformations
Basic 2D transformations as 3x3 matrices
 x '  s x 0 0  x 
 x ' 1 0 t x   x   y ' =  0 s 0   y 
 y ' =  0 1 t   y     y  
   y    1   0 0 1  1 
 1  0 0 1   1 

Translate Scale

 x' cos Q − sin Q 0  x   x '  1 shx 0  x 


 y ' =  sin Q cos Q 0  y   y ' =  sh  y
      y 1 0  
 1   0 0 1  1   1   0 0 1  1 

Rotate Shear
Affine Transformations
Affine transformations are combinations of …
• Linear transformations, and  x'  a b c   x 
 y ' = d e f   y 
• Translations  w   0 0 1   w
    
Properties of affine transformations:
• Origin does not necessarily map to origin
• Lines map to lines
• Parallel lines remain parallel
• Ratios are preserved
• Closed under composition
Outline
2D Transformations
• Basic 2D transformations
• Matrix representation
• Matrix composition
3D Transformations
• Basic 3D transformations
• Same as 2D
Matrix Composition
Transformations can be combined by
matrix multiplication
 x'   1 0 tx cos Q − sin Q 0 sx 0 0  x 
 y '  =  0 1 ty  sin Q cos Q 0  0 sy 0  y 
w'  0 0 1   0 0 1     
  0 0 1 w
   
p’ = T(tx,ty) R(Q) S(sx,sy) p
Matrix Composition
Matrices are a convenient and efficient way
to represent a sequence of transformations
• General purpose representation
• Hardware matrix multiply

p’ = (T * (R * (S*p) ) )
p’ = (T*R*S) * p
Matrix Composition
Be aware: order of transformations matters
– Matrix multiplication is not commutative

p’ = T * R * S * p

“Global” “Local”
Matrix Composition
What if we want to rotate and translate?
• Ex: Rotate line segment by 45 degrees about
endpoint a
and lengthen

a a
Multiplication Order – Wrong Way
Our line is defined by two endpoints
• Applying a rotation of 45 degrees, R(45), affects both points
• We could try to translate both endpoints to return endpoint a to
its original position, but by how much?

a
a a
Correct
Wrong 1. T(-3)
R(45) 2. R(45)
3. T(3)
Multiplication Order - Correct
Isolate endpoint a from rotation effects
a

• First translate line so a is at origin: T (-3)


a

• Then rotate line 45 degrees: R(45)


a

• Then translate back so a is where it was: T(3)


a
Matrix Composition
Will this sequence of operations work?
1 0 − 3 cos(45) − sin( 45) 0 1 0 3  a x   a' x 
0 1 0   sin( 45) cos(45) 0 0 1 0 a  = a' 
    y   y 
0 0 1   0 0 1 0 0 1  1   1 
Matrix Composition
After correctly ordering the matrices
Multiply matrices together
What results is one matrix – store it (on stack)!
Multiply this matrix by the vector of each vertex
All vertices easily transformed with one matrix
multiply
Overview
2D Transformations
• Basic 2D transformations
• Matrix representation
• Matrix composition
3D Transformations
• Basic 3D transformations
• Same as 2D
3D Transformations
Same idea as 2D transformations
• Homogeneous coordinates: (x,y,z,w)
• 4x4 transformation matrices

 x'   a b c d   x 
 y'  e f g h   y 
 z'  =  i j k l   z 
 w' m n o p   w
 
Basic 3D Transformations
 x '  s x 0 0 0  x 
 x '  1 0  x   y '  0 s 0 0  y 
0 0
 y ' 0 1 0 0  y   =
 z '  = 0
y
0 1 0  z   z '   0 0 sz 0  z 
 w  0 0 0 1  w     
w   0 0 0 1  w 
Identity Scale

 x ' 1 0 0 t x  x 
 y ' 0  x' − 1 0  x 
1 0 t y   y 
0 0
 =  y '  0 1 0 0  y 
 z '  0 0 1 tz  z   z' =  0 0 1 0  z 
      w   0 0 0 1  w
 w  0 0 0 1  w 
Translation Mirror about Y/Z plane
Geometric Transformations
3D Translation of Points:
Basic 3D Transformations
 x' cos Q − sin Q 0 0  x 
Rotate around Z axis:  y ' =  sin Q cos Q 0 0  y 
z' 0 0 1 0  z 
 w   0 0 0 1  w

 x '  cos Q 0 sin Q 0  x 


 y '  0 1 0 0  y 
Rotate around Y axis:   = 
 z '  − sin Q 0 cos Q 0  z 
    
w   0 0 0 1  w 

 x' 1 0 0 0  x 
Rotate around X axis:  y ' = 0 cos Q − sin Q 0  y 
 z '  0 sin Q cos Q 0  z 
 w  0 0 0 1  w
Geometric Transformations
3D Rotation of Points:
Geometric Transformations

DR. GEORGE KARRAZ, Ph. D. 70


Geometric Transformations

• Scaling & Translating equations

DR. GEORGE KARRAZ, Ph. D. 71


Geometric Transformations

DR. GEORGE KARRAZ, Ph. D. 72


THANK YOU!
NEXT: V IDEO MPEG

DR. GEORGE KARRAZ, Ph. D.


COMPUTER VISION
LECTURE X
VIDEO MPEG
DR. GEORGE KARRAZ, Ph. D.
Video Compression
• We need to compress video (more so than audio/images) in practice
since:
• 1 Uncompressed video (and audio) data are huge.
• In HDTV, the bit rate easily exceeds 1 Gbps| big problems for storage
and network communications. E.g. HDTV: 1920 x 1080 at 30 frames
per second, 8 bits per (PAL) channel =1.5 Gbps.
• 2 Lossy methods have to be employed since the compression ratio of
lossless methods (e.g.Human, Arithmetic, LZW) is not high enough for
image and video compression.

2 DR. GEORGE KARRAZ, Ph. D.


Video Compression: MPEG
• Not the complete picture studied here!
• Much more to MPEG | plenty of other tricks employed.
• We only concentrate on some basic principles of video compression:
• Earlier H.261 and MPEG 1 and 2 standards. with a brief introduction of
ideas used in new standards such as H.264 (MPEG-4 Advanced Video
Coding).
• Image, video, and audio compression standards have been specified and
released by two main groups since 1985:
• ISO International Standards Organization: JPEG, MPEG.
• ITU International Telecommunications Union: H.261-264.

3 DR. GEORGE KARRAZ, Ph. D.


Compression Standards
• Whilst in many cases one of the groups have specified separate standards there is
some crossover between the groups. E.g.:
• JPEG issued by ISO in 1989 (but adopted by ITU as ITU T.81) MPEG 1 released by
ISO in 1991, H.261 released by ITU in 1993 (based on CCITT 1990 draft).
• CCITT stands for Committee Consultant if International Telephonies et Telegram
whose parent organization is ITU.
• H.262 (better known as MPEG 2) released in 1994.
• H.263 released in 1996 extended as H.263+, H.263++.
• MPEG 4 released in 1998.
• H.264 releases in 2002 to lower the bit rates with comparable quality video and
support wide range of bit rates, and is now part of
• MPEG 4 (Part 10, or AVC { Advanced Video Coding).

4 DR. GEORGE KARRAZ, Ph. D.


How to Compress Video?
• Basic Idea of Video Compression:
• Exploit the fact that adjacent frames are similar.
• Spatial redundancy removal | intra frame coding (JPEG)
• NOT ENOUGH BY ITSELF?
• Temporal | greater compression by noting the temporal
coherence/incoherence over frames. Essentially we note the difference
between frames.
• Spatial and temporal redundancy removal | intra frame and inter frame
coding (H.261, MPEG).
• Things are much more complex in practice of course.

5 DR. GEORGE KARRAZ, Ph. D.


How to Compress Video?

It has been customary in the past to transmit successive complete


images of the transmitted picture." . . . \In accordance with this
invention, this diculty is avoided by transmitting only the diference
between successive images of the object".

6 DR. GEORGE KARRAZ, Ph. D.


Simple Motion Example
• Consider a simple image of a moving circle.
• Lets just consider the difference between 2 frames.
• It is simple to encode/decode:

7 DR. GEORGE KARRAZ, Ph. D.


Estimating Motion of Blocks
• We will examine methods of estimating motion vectors
• in due course.

8 DR. GEORGE KARRAZ, Ph. D.


Decoding Motion of Blocks

9 DR. GEORGE KARRAZ, Ph. D.


Motion Estimation Example

10 DR. GEORGE KARRAZ, Ph. D.


How is Motion Compensation Used?

• Block Matching:
• MPEG-1/H.261 relies on block matching techniques.
• For a certain area (block) of pixels in a picture: Find a good estimate of
this area in a previous (or in a future!) frame, within a specified search
area.
• Motion compensation: Uses the motion vectors to compensate the
picture. Parts of a previous (or future) picture can be reused in a
subsequent picture.
• Individual parts spatially compressed | JPEG type compression.

11 DR. GEORGE KARRAZ, Ph. D.


Any Overheads?
• Motion estimation/compensation techniques reduces the
• video bitrate significantly but Introduce extra computational
complexity. Decoder needs to buffer reference pictures | backward
and forward referencing.
• Delay.
• Lets see how such ideas are used in practice.

12 DR. GEORGE KARRAZ, Ph. D.


Overview of H.261
• Developed by CCITT in 1988-1990 for video telecommunication applications.
• Meant for videoconferencing, video telephone applications over ISDN
telephone lines.
• Baseline ISDN is 64 k bits/sec, and integral multiples (p64).
• Frame types are CCIR 601 CIF (Common Intermediate Format) (352x288) and
QCIF (176x144) images with 4:2:0 subsampling.
• Two frame types: Intra-frames (I-frames) and Inter-frames (P-frames). I-frames
use basically JPEG | but YUV (YCrCb) and larger DCT windows, different
quantisation.
• I-frames provide us with a refresh accessing point | key frames.
• P-frames use pseudo-differences from previous frame (predicted), so frames
depend on each other.

13 DR. GEORGE KARRAZ, Ph. D.


H.261 Group of Pictures
• We typically have a group of pictures | one I-frame followed by
several P-frames | a group of pictures.
• Number of P-frames followed by each I-frame determines the size of
GOP | can be fixed or dynamic.
• Why this cannot be too large?

14 DR. GEORGE KARRAZ, Ph. D.


Intra-frame Coding
• Various lossless and lossy compression techniques use like JPEG.
• Compression contained only within the current frame.
• Simpler coding | not enough by itself for high compression.
• Can't rely on intra frame coding alone not enough compression:
• Motion JPEG (MJPEG) standard does exist | not commonly used.
• So introduce idea of inter frame difference coding.
• However, cant rely on inter frame differences across a large number
of frames
• So when errors get too large | start a new I-frame.

15 DR. GEORGE KARRAZ, Ph. D.


Intra-frame Coding (Cont.)
• Intra-frame coding is very similar to JPEG:

16 DR. GEORGE KARRAZ, Ph. D.


Intra-frame Coding (Cont.)
• A basic intra-frame coding scheme is as follows:
• Macroblocks are typically 16x16 pixel areas on Y plane of original image.
• A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block. (4:2:0
Chroma subsampling)
• Eye most sensitive to luminance, less sensitive to chrominance.
• We operate on a more effective color space: YUV (YCbCr) color which we
studied earlier.
• Typical to use 4:2:0 macroblocks: one quarter of the chrominance
information used.
• Quantization is by constant value for all DCT coefficients. I.e., no quantization
table as in JPEG.

17 DR. GEORGE KARRAZ, Ph. D.


Inter-frame (P-frame) Coding
• Intra frame limited to spatial basis relative to 1 frame.
• Considerably more compression if the inherent temporal basis is exploited
as well.
• BASIC IDEA:
• Most consecutive frames within a sequence are very similar to the frames
both before (and after) the frame of interest.
• Aim to exploit this redundancy.
• Use a technique known as block-based motion compensated prediction.
• Need to use motion estimation.
• Coding needs extensions for inter- but encoder can also supports an intra-
subset.

18 DR. GEORGE KARRAZ, Ph. D.


Inter-frame (P-frame) Coding (Cont.)
• P-coding can be summarized as follows:

19 DR. GEORGE KARRAZ, Ph. D.


Inter-frame (P-frame) Coding (Cont.)

20 DR. GEORGE KARRAZ, Ph. D.


Inter-frame (P-frame) Coding (Cont.)

21 DR. GEORGE KARRAZ, Ph. D.


Motion Vector Search
• So we know how to encode a P-block.
• How do we find the motion vector?

22 DR. GEORGE KARRAZ, Ph. D.


Motion Estimation
• The temporal prediction technique used in MPEG video is based on
motion estimation.
• The basic premise:
• Consecutive video frames will be similar except for changes induced
by objects moving within the frames.
• Trivial case of zero motion between frames | no other differences
except noise etc.
• Easy for the encoder to predict the current frame as a duplicate of the
prediction frame.
• When there is motion in the images, the situation is not as simple.

23 DR. GEORGE KARRAZ, Ph. D.


Example
• The problem for motion estimation to solve is:
• How to adequately represent the changes, or differences, between
these two video frames.

24 DR. GEORGE KARRAZ, Ph. D.


Solution
• A comprehensive 2-dimensional spatial search is performed for each
luminance macroblock.
• Motion estimation is not applied directly to chrominance in MPEG
• MPEG does not dene how this search should be performed.
• A detail that the system designer can choose to implement in one of many
possible ways.
• Well known that a full, exhaustive search over a wide 2-D area yields the
best matching results in most cases, but at extreme computational cost to
the encoder.
• Motion estimation usually is the most computationally expensive portion
of the video encoding.

25 DR. GEORGE KARRAZ, Ph. D.


Motion Estimation Example

26 DR. GEORGE KARRAZ, Ph. D.


Motion Vectors, Matching Blocks
• Previous figure shows an example of a particular macro block from
Frame 2 of earlier example, relative to various macroblocks of Frame 1:
• The top frame has a bad match with the macro block to be coded.
• The middle frame has a fair match, as there is some commonality
between the 2 macroblocks.
• The bottom frame has the best match, with only a slight error between
the 2 macroblocks.
• Because a relatively good match has been found, the encoder assigns
motion vectors to that macroblock,

27 DR. GEORGE KARRAZ, Ph. D.


Final Motion Estimation Prediction

28 DR. GEORGE KARRAZ, Ph. D.


Final Motion Estimation Prediction (Cont.)
• The predicted frame is subtracted from the desired frame,
• Leaving a (hopefully) less complicated residual error frame which can
then be encoded much more efficiently than before motion
estimation.

29 DR. GEORGE KARRAZ, Ph. D.


Example

30 DR. GEORGE KARRAZ, Ph. D.


Example

31 DR. GEORGE KARRAZ, Ph. D.


Example

32 DR. GEORGE KARRAZ, Ph. D.


Further Coding Efficiency
• Differential Coding of Motion Vectors
• Motion vectors tend to be highly correlated between macroblocks:
• The horizontal component is compared to the previously valid
horizontal motion vector and
• Only the difference is coded.
• Same difference is calculated for the vertical component
• Difference codes are then described with a variable length code (e.g.
Huffman) for maximum compression efficiency.

33 DR. GEORGE KARRAZ, Ph. D.


Recap: P-Frame Coding Summary

34 DR. GEORGE KARRAZ, Ph. D.


Estimating the Motion Vectors
• So how do we find the motion?
• Basic Ideas is to search for Macro block (MB)
• Within a n x m pixel search window
• Work out for each window
• Sum of Absolute Difference (SAD) (or Mean Absolute Error (MAE))
• Choose window where SAD/MAE is a minimum. If the encoder decides that
no acceptable match exists then it has the option of
• Coding that particular macroblock as an intra macro block, even though it
may be in a P frame!
• In this manner, high quality video is maintained at a slight cost to coding
effciency.

35 DR. GEORGE KARRAZ, Ph. D.


Full Search
• Search exhaustively the whole (2R + 1) (2R + 1) window in the
reference frame.
• A macro block centered at each of the positions within the window is
compared to the macroblock in the target frame pixel by pixel and
their respective SAD (or MAE) is computed.
• The vector (i, j) that offers the least SAD (or MAE) is designated as the
motion vector for the macroblock in the target frame.
• Full search is very costly.

36 DR. GEORGE KARRAZ, Ph. D.


Full Search
• Advantages:
Guaranteed to find optimal motion vector within search range.
• Disadvantages:
Can only search among many candidates. What if the motion is in fractional
number of pixels?
• High computation complexity: O((2R + 1) S).
• HOW TO IMPROVE?
1.Accuracy: consider fractional translations, this requires interpolation
(e. g. bilinear in H.263).
2.Speed: try to avoid checking unlikely candidates.

37 DR. GEORGE KARRAZ, Ph. D.


Bilinear Interpolation

38 DR. GEORGE KARRAZ, Ph. D.


2D Logarithmic Search
• An approach takes several iterations akin to a binary search.
Computationally cheaper, suboptimal but usually effective.
• Initially only nine locations in the search window are used as seeds
for a SAD-based search (marked as `1').
• After locating the one with the minimal SAD, the center of the new
search region is moved to it and the step-size is reduced to half.
• In the next iteration, the nine new locations are marked as `2' and this
process repeats. positions, only 9L positions are checked. If L
iterations are applied, for altogether 9L

39 DR. GEORGE KARRAZ, Ph. D.


2D Logarithmic Search (Cont.)

40 DR. GEORGE KARRAZ, Ph. D.


Hierarchical Motion Estimation

1. Form several low resolution version of the target and


reference pictures.
2. Find the best match motion vector in the lowest
resolution version.
3. Modify the motion vector level by level when going
up.

41 DR. GEORGE KARRAZ, Ph. D.


Hierarchical Motion Estimation

42 DR. GEORGE KARRAZ, Ph. D.


Performance Comparison
• Operation for 720x480 at 30 fps (GOPS):
• Search Method p = 15 p=7
• Full Search 29.890 6.990
• Logarithmic 1.020 0.778
• Hierarchical 0.507 0.399

43 DR. GEORGE KARRAZ, Ph. D.


MPEG Compression
• MPEG stands for:
• Motion Picture Expert Group | established circa 1990 to create standard
for delivery of audio and video
• MPEG-1 (1991).Target: VHS quality on a CD-ROM (320 x 240 + CD audio @
1.5 Mbits/sec).
• MPEG-2 (1994): Target Television Broadcast.
• MPEG-3: HDTV but subsumed into an extension of MPEG-2.
• MPEG 4 (1998): Very Low Bitrate Audio-Visual Coding, later MPEG-4 Part
10 (H.264) for wide range of bitrates and better compression quality.
• MPEG-7 (2001) \Multimedia Content Description Interface".
• MPEG-21 (2002) \Multimedia Framework".

44 DR. GEORGE KARRAZ, Ph. D.


Three Parts to MPEG
• The MPEG standard had three parts:
• Video: based on H.261 and JPEG
• Audio: based on MUSICAM (Masking pattern adapted
• Universal Sub-band Integrated Coding And Multiplexing) technology
• System: control interleaving of streams

49 DR. GEORGE KARRAZ, Ph. D.


MPEG Video
• MPEG compression is essentially an attempt to overcome some
shortcomings of H.261 and JPEG:

46 DR. GEORGE KARRAZ, Ph. D.


The Need for a Bidirectional Search
• The problem here is that many macroblocks need information that is
not in the reference frame.
• For example:
• Occlusion by objects affects differencing
• Difficult to track occluded objects etc.
• MPEG uses forward/backward interpolated prediction.

47 DR. GEORGE KARRAZ, Ph. D.


MPEG B-Frames
• The MPEG solution is to add a third frame type which is a
bidirectional frame, or B-frame
• B-frames search for macroblock in past and future frames.
• Typical pattern is IBBPBBPBB
• IBBPBBPBB. Actual pattern is up to encoder, and need not be regular.

48 DR. GEORGE KARRAZ, Ph. D.


Example: I, P, and B frames
• Consider a group of pictures that lasts for 6 frames:
• Given: I,B,P,B,P,B,I,B,P,B,P,B,.
• I frames are coded spatially only (as before in H.261).
• P frames are forward predicted based on previous I and P frames (as before
in H.261).
• B frames are coded based on a forward prediction from a previous I or P
frame, as well as a backward prediction from a succeeding I or P frame.

49 DR. GEORGE KARRAZ, Ph. D.


Bidirectional Prediction

50 DR. GEORGE KARRAZ, Ph. D.


Example: I, P, and B frames (Cont.)

• 1st B frame is predicted from the 1st I frame


and 1st P frame.
• 2nd B frame is predicted from the 1st and
2nd P frames.
• 3rd B frame is predicted from the 2nd and
3rd P frames.
• 4th B frame is predicted from the 3rd P
frame and the 1st I frame of the next group
of pictures.

51 DR. GEORGE KARRAZ, Ph. D.


Bidirectional Prediction

52 DR. GEORGE KARRAZ, Ph. D.


Backward Prediction Implications
• Note: Backward prediction requires that the future frames that
are to be used for backward prediction be
• Encoded and transmitted first, i.e. out of order.
• This process is summarized:

53 DR. GEORGE KARRAZ, Ph. D.


Backward Prediction Implications (Cont.)
• Also NOTE:
• No dened limit to the number of consecutive B frames that may be
used in a group of pictures.
• Optimal number is application dependent.
• Most broadcast quality applications, however, have tended to use 2
consecutive B frames (I,B,B,P,B,B,P,. . . ) as the ideal trade-o
between compression efficiency and video quality.
• MPEG suggests some standard groupings.

54 DR. GEORGE KARRAZ, Ph. D.


Advantage of Using B frames
• Coding efficiency.
• Most B frames use less bits.
• Quality can also be improved in the case of moving objects that reveal
hidden areas within a video sequence.
• Better error propagation: B frames are not used to predict future frames,
errors generated will not be propagated further within the sequence.
• Disadvantage:
• Frame reconstruction memory buffers within the encoder and decoder
must be doubled in size to accommodate the 2 anchor frames.
• More delays in real-time applications.

55 DR. GEORGE KARRAZ, Ph. D.


Frame Sizes

56 DR. GEORGE KARRAZ, Ph. D.


Random Access Points

57 DR. GEORGE KARRAZ, Ph. D.


MPEG-2, MPEG-3, and MPEG-4

58 DR. GEORGE KARRAZ, Ph. D.


THANK YOU!
NEXT: M OTION TRACKING

DR. GEORGE KARRAZ, Ph. D.


COMPUTER VISION
LECTURE XI
MOTION TRACKING

DR. GEORGE KARRAZ, Ph. D.


Contents:

The Problem
Goals
Approaches
The Optical Flow Method
Algorithm

DR. GEORGE KARRAZ, Ph. D. 2


The Problem

Given a set of images in time which are similar but not identical,
derive a method for identifying the motion that has occurred (in
2d) between different images.

DR. GEORGE KARRAZ, Ph. D. 3


Goals
Input:
➢ an image sequence
➢ captured with a fixed camera
➢ containing one or more moving objects of interest
Processing goals: determine the image regions where significant
motion has occurred
Output: an outline of the motion within the image sequence

DR. GEORGE KARRAZ, Ph. D. 4


Motion Detection and Estimation

Image differencing
➢ based on the thresholded difference of successive images
➢ difficult to reconstruct moving areas
Background subtraction
➢ foreground objects result by calculating the difference between an image
in the sequence and the background image (previously obtained)
➢ remaining task: determine the movement of these foreground objects
between successive frames
Block motion estimation
➢ Calculates the motion vector between frames for sub-blocks of the image
➢ mainly used in image compression
➢ too coarse
Optical Flow
DR. GEORGE KARRAZ, Ph. D. 5
What Is Optical Flow?

Optical flow is the displacement field for


each of the pixels in an image sequence.
For every pixel, a velocity vector  , 
 dx dy
 dt dt 
is found which says:
➢how quickly a pixel is moving across
the image
➢the direction of its movement.

DR. GEORGE KARRAZ, Ph. D. 6


Optical Flow Examples

Translation Rotation Scaling

DR. GEORGE KARRAZ, Ph. D. 7


Algorithm

Optical flow: maximum one pixel large


movements
Optical flow: larger movements
Morphological filter
Contour detection (demo purposes)

DR. GEORGE KARRAZ, Ph. D. 8


Optical Flow: maximum one pixel large
movements

The optical flow for a pixel (i, j ) given 2


successive images k and k + 1 :
mk (i, j ) = ( x, y ) so that

I k (i, j ) − I k +1 (i + x, j + y ) (1)

is minimum for − 1  x  1,−1  y  1

DR. GEORGE KARRAZ, Ph. D. 9


k k+1
Optical Flow: maximum one pixel large
movements (2)

More precision: consider a 3×3 window around


the pixel:

Optical flow for pixel (i, j ) becomes:


mk (i, j ) = ( x, y ) so that
1 1 1 1

I
u = −1v = −1
k (i + u, j + v) −   I k +1 (i + u + x, j + v + y )
u = −1v = −1
(2)
10
is minimum for − 1  x  1,−1  y  1 DR. GEORGE KARRAZ, Ph. D.
Optical Flow: larger movements

Reduce the size of the image


=> reduced size of the movement

Solution: multi-resolution analysis of the images


Advantages: computing efficiency, stability
DR. GEORGE KARRAZ, Ph. D. 11
Multi-resolution Analysis

Coarse to fine optical flow estimation:


32×32

64×64

128×128

256×256

Original image k Original image k+1


DR. GEORGE KARRAZ, Ph. D. 12
Gaussian Pyramid

Lowest level g 0 - the original image


Level gl - the weighed average of values in gl −1
in a 5×5 window:
2 2
g l (i, j ) =   w(m, n )g l −1 (2i + m,2 j + n ) (3)
m = −2n = −2

DR. GEORGE KARRAZ, Ph. D. 13


Gaussian Pyramid (2)

The mask G (m, n ) is an approximation of the 2D


Gaussian:
0.003 0.013 0.022 0.013 0.003
0.013 0.060 0.098 0.060 0.013
0.022 0.098 0.162 0.098 0.022
0.013 0.060 0.098 0.060 0.013
0.003 0.013 0.022 0.013 0.003

The mask is symmetric and separable:


G(m, n) = Gr (m)* Gc (n) (4)
DR. GEORGE KARRAZ, Ph. D. 14
Optical Flow: Top-down Strategy

Algorithm (1/4 scale of resolution reduction):


Step 1: compute optical flow vectors for the highest
level of the pyramid l (smallest resolution)
Step 2: double the values of the vectors
Step 3: first approximation: optical flow vectors for the
(2i, 2j), (2i+1, 2j), (2i, 2j+1), (2i+1, 2j+1) pixels in the l-1
level are assigned the value of the optical flow vector for
the (i,j) pixel from the l level

DR. GEORGE KARRAZ, Ph. D. 15


Level l Level l-1
Optical Flow: Top-down Strategy (2)

Step 4:
➢ adjustment of the vectors of the l-1 level in the pyramid
➢ method: detection of maximum one pixel displacements
around the initially approximated position

Step 5: smoothing of the optical flow field (Gaussian


filter) DR. GEORGE KARRAZ, Ph. D. 16
Filtering the Size of the Detected Regions

Small isolated regions of motion detected by the


optical flow method are classified as noise and
are eliminated with the help of morphological
operations:
Step 1: Apply the opening:
Step 2: Apply the closing:

DR. GEORGE KARRAZ, Ph. D. 17


Contour Detection
For demonstration purposes, the contours of the moving regions detected
are outlined

Method: the Sobel edge detector:


 f f 
f (x, y ) =  ,  = ( f x , f y )
➢ Compute the intensity gradient:
(5)
 x y 
using the Sobel masks:  − 1 0 1 − 1 − 2 − 1
Gx = − 2 0 2 , G y =  0 0  (6)
1 1
0
4 4
 − 1 0 1   1 2 1 
➢ Compute the magnitude of the gradient:

M (x, y ) = f (x, y ) = fx + f y (7)


2 2

➢ if M (x, y )  threshold then edge pixel


else non-edge pixel
DR. GEORGE KARRAZ, Ph. D. 18
A Block Diagram of the System

DR. GEORGE KARRAZ, Ph. D. 19


THANK YOU!
NEXT: M OTION ESTIMATION

DR. GEORGE KARRAZ, Ph. D.


COMPUTER VISION
LECTURE XII
MOTION ESTIMATION
DR. GEORGE KARRAZ, Ph. D.
Problem definition: optical flow

How to estimate pixel motion from image H to image I?


• Solve pixel correspondence problem
– given a pixel in H, look for nearby pixels of the same color in I

Key assumptions
• color constancy: a point in H looks the same in I
– For grayscale images, this is brightness constancy
• small motion: points do not move very far
2
This is called the optical flow problem DR. GEORGE KARRAZ, Ph. D.
Optical flow constraints (grayscale images)

Let’s look at these constraints more closely


• brightness constancy: Q: what’s the equation?

• small motion: (u and v are less than 1 pixel)


– suppose we take the Taylor series expansion of I:

3
DR. GEORGE KARRAZ, Ph. D.
Optical flow equation
Combining these two equations

In the limit as u and v go to zero, this becomes exact

DR. GEORGE KARRAZ, Ph. D. 4


Optical flow equation

Q: how many unknowns and equations per pixel?

Intuitively, what does this constraint mean?


• The component of the flow in the gradient direction is determined
• The component of the flow parallel to an edge is unknown

DR. GEORGE KARRAZ, Ph. D. 5


Aperture problem

DR. GEORGE KARRAZ, Ph. D. 6


Aperture problem

DR. GEORGE KARRAZ, Ph. D. 7


Solving the aperture problem
How to get more equations for a pixel?
• Basic idea: impose additional constraints
– most common is to assume that the flow field is smooth locally
– one method: pretend the pixel’s neighbors have the same (u,v)
» If we use a 5x5 window, that gives us 25 equations per pixel!

DR. GEORGE KARRAZ, Ph. D. 8


RGB version
How to get more equations for a pixel?
• Basic idea: impose additional constraints
– most common is to assume that the flow field is smooth locally
– one method: pretend the pixel’s neighbors have the same (u,v)
» If we use a 5x5 window, that gives us 25*3 equations per pixel!

DR. GEORGE KARRAZ, Ph. D. 9


DR. GEORGE KARRAZ, Ph. D.

Lukas-Kanade flow
Prob: we have more equations than unknowns

Solution: solve least squares problem


• minimum least squares solution given by solution (in d) of:

• The summations are over all pixels in the K x K window


• This technique was first proposed by Lukas & Kanade (1981)
– described in Trucco & Verri reading 10
Conditions for Solvability
• Optimal (u, v) satisfies Lucas-Kanade equation

When is This Solvable?


• ATA should be invertible
• ATA should not be too small due to noise
– eigenvalues l1 and l2 of ATA should not be too small
• ATA should be well-conditioned
– l1/ l2 should not be too large (l1 = larger eigenvalue)

DR. GEORGE KARRAZ, Ph. D. 11


DR. GEORGE KARRAZ, Ph. D.

Eigenvectors of ATA

Suppose (x,y) is on an edge. What is ATA?


• gradients along edge all point the same direction
• gradients away from edge have small magnitude

• is an eigenvector with eigenvalue


• What’s the other eigenvector of ATA?
– let N be perpendicular to

– N is the second eigenvector with eigenvalue 0


The eigenvectors of ATA relate to edge direction and magnitude 12
DR. GEORGE KARRAZ, Ph. D.

Edge

– large gradients, all the same


– large l1, small l2
13
Low texture region

– gradients have small magnitude


– small l1, small l2
DR. GEORGE KARRAZ, Ph. D. 14
High textured region

– gradients are different, large magnitudes


– large l1, large l2
DR. GEORGE KARRAZ, Ph. D. 15
Observation
This is a two image problem BUT
• Can measure sensitivity by just looking at one of the images!
• This tells us which pixels are easy to track, which are hard
– very useful later on when we do feature tracking...

DR. GEORGE KARRAZ, Ph. D. 16


Errors in Lukas-Kanade
What are the potential causes of errors in this procedure?
• Suppose ATA is easily invertible
• Suppose there is not much noise in the image
When our assumptions are violated
• Brightness constancy is not satisfied
• The motion is not small
• A point does not move like its neighbors
– window size is too large
– what is the ideal window size?

DR. GEORGE KARRAZ, Ph. D. 17


Improving accuracy
Recall our small motion assumption

This is not exact


• To do better, we need to add higher order terms back in:

This is a polynomial root finding problem


• Can solve using Newton’s method
– Also known as Newton-Raphson method

• Lukas-Kanade method does one iteration of Newton’s method


– Better results are obtained via more iterations

DR. GEORGE KARRAZ, Ph. D. 18


Iterative Refinement
Iterative Lukas-Kanade Algorithm
1. Estimate velocity at each pixel by solving Lucas-Kanade equations
2. Warp H towards I using the estimated flow field
- use image warping techniques
3. Repeat until convergence

DR. GEORGE KARRAZ, Ph. D. 19


Revisiting the small motion assumption

Is this motion small enough?


• Probably not—it’s much larger than one pixel (2nd order terms dominate)
• How might we solve this problem?
DR. GEORGE KARRAZ, Ph. D. 20
Reduce the resolution!

DR. GEORGE KARRAZ, Ph. D. 21


DR. GEORGE KARRAZ, Ph. D.

Coarse-to-fine optical flow estimation 22

u=1.25 pixels

u=2.5 pixels

u=5 pixels

image H u=10 pixels image I

Gaussian pyramid of image H Gaussian pyramid of image I


DR. GEORGE KARRAZ, Ph. D.

Coarse-to-fine optical flow estimation 23

run iterative L-K


warp & upsample

run iterative L-K


.
.
.

image H
J image I

Gaussian pyramid of image H Gaussian pyramid of image I


Multi-resolution Lucas Kanade Algorithm

DR. GEORGE KARRAZ, Ph. D. 24


Optical Flow Results

DR. GEORGE KARRAZ, Ph. D. 25


Optical Flow Results

DR. GEORGE KARRAZ, Ph. D. 26


Optical flow Results

DR. GEORGE KARRAZ, Ph. D. 27


THANK YOU!
END OF COMPUTER VISION COURSE:

DR. GEORGE KARRAZ, Ph. D.

You might also like