2023 - AVSA - Unit II (20230211)

Joint Master Degree (JMD) in Image Processing and Computer Vision (IPCV)
Applied Video Sequences Analysis

Unit II: Foreground Segmentation and Object
Detection
José M. Martínez Sánchez
JoseM.Martinez@uam.es
Escuela Politécnica Superior Universidad Autónoma de Madrid Video Processing and Understanding Lab
Contents
• Foreground Segmentation and Object Detection
▪ Foreground segmentation
▪ Object detection and classification
© 2023 (josem.martinez@uam.es) – AVSA II 2 Foreground Segmentation and Object Detection: Contents

Contents
▪ Introduction
▪ Performance evaluation
▪ Basic algorithm
▪ Limitations of basic algorithm
▪ Background model update
▪ Hot start
▪ Shadow detection
▪ Robust background models
• Parametric models
• Non-parametric models
▪Object detection and classification
© 2023 (josem.martinez@uam.es) – AVSA II.1 3 FgSegm&ObjDetect: Foreground segmentation
Introduction
• Goal of Foreground Segmentation

▪ To determine regions that do not belong to the
background of the scene
▪ “mobile” objects over static background

Introduction
• Previous stage of other video analysis

algorithms
▪object detection, object classification, tracking,
event detection, …
• Segmentation algorithms based on
▪background subtraction
▪motion detection

Introduction
• Segmentation based on background subtraction

▪Assumes static camera and “static” background
(little variation)
▪Generation of mathematical model of appearance
(gray level / color) of background image at pixel
or block level
▪Every frame compared with background model
▪Mobile objects (foreground)
▪ Pixels or blocks of pixels with significant
differences between current frame and
background model
▪Fast algorithms
Introduction
• Segmentation based on motion detection

▪Velocity vector estimated at every pixel or block
of pixels from adjacent frames
▪Majority motion is determined (camera motion)
▪Mobile objects (foreground)
▪ Pixels or blocks of pixels with significant
differences between their motion and the
camera motion
▪Larger computational cost
▪ Velocity estimation (optical flow)

Introduction
• Video surveillance typical scenarios:

▪Majority of static cameras
▪ … but PTZ (use of “panoramic background”)
▪ … but moving cameras (3G … not so typical
currently)
▪Scenes with static background
▪ … but multimodality (use of more complex models)
▪Real time processing
▪ … but forensic systems
Typically Foreground Segmentation is based
on Background Subtraction
Introduction
Stages of Background Subtraction (Bouwmans 2014)
• Background initialization
sdasdasd
• Background modelling
• Background maintenance
• Foreground detection
Image contributed by Marcos Escudero (marcos.escudero@uam.es)

Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Performance evaluation
Introduction
• Automated video-surveillance systems have
experienced great development in the last years
due to the need of more security in private and
public spaces
• Many approaches are available but their
effectiveness is not clear in real situations and
environments, where the “theoretical”
performance degrades significantly
▪Variability in environmental conditions: lighting
conditions, weather conditions (sunny, rainy, foggy)
▪Medium-high density, crowds, …

Introduction
• Performance evaluation should be performed
for each analysis stage and for the entire
system Performance
evaluation
Based on analyzing
principles and
properties (e.g., Analytical Empirical Based on measuring
detection probability methods methods the quality of results
ratio, processing
complexity,
segmentation
resolution) Goodness/Stand-alone Discrepancy
methods methods
Based on quality criteria about Quantify the deviation of the

the segmented object, results from reference results -
trajectory, … Need a ground-truth
Y. J. Zhang, “A survey on evaluation methods for image segmentation,” Pattern Recognition, 29(8):1335–1346, 1996.
J.S. Cardoso, L. Corte-Real, “Toward a Generic Evaluation of Image Segmentation”, IEEE Transactions of Image
Processing, 14(11):1773-1782, Nov. 2005

Introduction
• Scenario classification
▪Evaluation should consider different scenarios to
appropriately represent real world conditions, that
can be classified by two main criteria
▪ Complexity
• Static/moving cameras, indoor/outdoor, multimodal
background, …
▪ Density of (moving) objects of interest
Scenario Complexity Density
S1 Low Low
S2 High Low
S3 Low High
S4 High High

Introduction
• Scenarios examples for event recognition
task
S1 Low Low
S1 - Standing S2 High Low
S2 – UseObject
S3 Low High
S3 – AbandonedObject S4 High High S4 – BagStealing

Discrepancy methods
• Discrepancy methods: Objective evaluation of
results using “rigorous evaluation protocols”
▪ Datasets: set of sequences covering real word
conditions and large enough to be
representative; and associated ground-truths
▪ Metrics: objective measures quantifying
performances
▪ measuring the deviation of the algorithm results
from the ground-truth data

Discrepancy methods: Datasets
Covered scenario
S1 S2 S3 S4
Video object segmentation
VSSN2006 X X
IPPR06 X
CVSG X X
SABS X X S1 Low Low
CDW2012 X X S2 High Low
People detection
S3 Low High
ETHZ X
TUD-Pedestrians X S4 High High
DCII X
Caltech Pedestrian X
PDds X X X
Video object tracking
PETS X X X
VISOR X
EPFL X X
SOVTds X X
Event detection
CAVIAR X X
ETISEO X X X
PETS 2006 X X
PETS 2007 X X X
I-LIDS X X
VISOR X X
CANDELA X X
CANTATA X
ASODds X X
EDds X X

• ChangeDetection.NET (CDNET)
▪http://changedetection.net
▪ “.. encapsulates a rigorous and comprehensive
academic benchmarking effort for testing and
ranking existing and new algorithms for change and
motion detection. It will be revised/expanded from
time to time based on received feedback, and will
maintain a comprehensive ranking of submitted
methods for years to come.”
• 2012dataset: baseline, dynamic background, camera
jitter, intermittent object motion, shadow, thermal
• 2014dataset: + challenging weather, low frame-rate,
night, PTZ, air turbulence
• SceneBackgroundModelling.NET (SBMnet)
▪http://pione.dinf.usherbrooke.ca/
▪“… rigorous and comprehensive academic
benchmarking effort for testing and ranking existing
and new algorithms for scene background
modelling. It will maintain a comprehensive ranking
of submitted methods for years to come.”
▪ The dataset is focused on 8 categories conformed
by a total of 76 video sequences
• SBMC2016: Basic, Intermittent Motion, Clutter, Jitter,
Illumination Changes, Background Motion, Very
Long, Very Short

• Background Estimation dataset (BEds)
▪http://www-vpu.eps.uam.es/DS/BEds
▪“… corpus of video sequences generated from
publicly available video-surveillance datasets to
cover several Background Estimation challenges.”
▪ The dataset is focused on 4 categories conformed
by 10 video sequences each and the associated
ground-truth background image
• Baseline, Clutter, Low framerate, Static objects

Discrepancy methods: Metrics
• Foreground/background (objects) segmentation
▪ Low level: Pixel level
▪ Binary segmentation mask
▪ Recall, Precision, F-score
• P = TP/(TP+FP)
• R = TP/(TP+FN)
• F = 2 PR/(P+R)
» TP: true positive (correct detection), FP: false positive
(false detection), FN: false negative (missed detection)
▪ Higher level: Region level, Object level
▪ Centre of segmented “area”, splits&merges of
regions, …
• ChangeDetection.NET (CDNET)
▪http://changedetection.net
▪ Average ranking accross categories : (rank:Baseline + rank:Dynamic Background +
rank:Camera Jitter + rank:Intermittent Object Motion + rank:Shadow + rank:Thermal) / 6
▪ Average ranking : (rank:Recall + rank:Spec + rank:FPR + rank:FNR + rank:PWC +
rank:FMeasure + rank:Precision) / 7
▪ Re (Recall) : TP / (TP + FN)

▪ Sp (Specficity) : TN / (TN + FP)
▪ FPR (False Positive Rate) : FP / (FP + TN)
▪ FNR (False Negative Rate) : FN / (TP + FN)
▪ PWC (Percentage of Wrong Classifications) : 100 * (FN + FP) / (TP + FN + FP + TN)
▪ F-Measure : (2 * Precision * Recall) / (Precision + Recall)
▪ Precision : TP / (TP + FP)
▪ FPR-S : Average False positive rate in hard shadow areas
▪ TP : True Positive - FP : False Positive - FN : False Negative - TN : True Negative

• Background Estimation dataset (BEds)
▪http://www-vpu.eps.uam.es/DS/BEds
▪ Not specified.
• SceneBackgroundModelling.NET (SBMnet)
▪http://pione.dinf.usherbrooke.ca/
▪ AGE: (Average Gray-level Error)
▪ pEPs: (Percentage of Error Pixels)
▪ pCEPS: (Percentage of Clustered Error Pixels)
▪ MSSSIM: (MultiScale Structural Similarity Index)
▪ PSNR: (Peak-Signal-to-Noise-Ratio)
▪ CQM: (Color image Quality Measure)

Stand-alone methods
• New approaches trying to evaluate without
ground-truth datasets
▪Discrepancy methods are time consuming and with
possible error/ambiguities due to human annotation
▪More important, NGT evaluation can be used to
adaptively optimize real-time operation
▪Well suited to integrate unsupervised tools
▪Well suited to human perception of good
performance
▪ By first defining the measure, an algorithm can be
designed to fit the measure
• Goodness measures used to design algorithms
Stand-alone methods: goodness parameters
• Pixel level
▪ Applicable as neighbourhood (fg/bg region)
analysis
• Region level
▪ Colour uniformity
▪ Entropy
▪ Intra-region uniformity
▪ Inter-region contrast
▪ Region shape
▪ Motion coherence

Stand-alone methods: goodness parameters
• Object level
▪ Most relevant:
▪ Motion coherence
• rigid vs non-rigid objects
▪ Colour contrast along the boundaries of object

Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Basic algorithm
Background model
• First image in the sequence (background

image)
Model
Background Image
Basic algorithm
Comparison
• Background image is subtracted from current

image pixel by pixel …
𝐼𝑚𝑎𝑔𝑒𝑡 𝑥, 𝑦 − 𝐵𝐾𝐺 𝑥, 𝑦
Current Image 
Background Image
Basic algorithm
Comparison
• … absolute differences …
𝐼𝑚𝑎𝑔𝑒𝑡 𝑥, 𝑦 − 𝐵𝐾𝐺 𝑥, 𝑦
Background Image
Basic algorithm
Comparison
• Absolute differences are compared against

predefined threshold
𝐼𝑚𝑎𝑔𝑒𝑡 𝑥, 𝑦 − 𝐵𝐾𝐺 𝑥, 𝑦 > 𝜏
Background Image
Basic algorithm
Foreground detection
• Pixels detected as “mobile objects” if

difference above a given threshold
Imaget  x, y  − BKG  x, y     Foret  x, y  = 1
Imaget  x, y  − BKG  x, y     Foret  x, y  = 0
Foreground Binary Mask

(Detected pixels)
Background Image
Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Limitations of basic algorithm
noise and sensitivity
• Threshold determines noise and sensitivity
1
2 3
1   2   3
• Threshold determines noise and sensitivity
▪High threshold Little noise, Low sensitivity
▪Low threshold Big noise, High sensitivity
High threshold Low threshold

• “Solution”: Sensitivity with high threshold can be
increased by processing square windows of
pixels
W W
 Imaget  x + i, y + j  − BKG  x + i, y + j     Foret  x, y  = 1

i =−W j =−W
W =0 W = 1 (3x3) W = 2 (5 x5)

• “Solution”: Sensitivity with high threshold can be
increased by processing square windows of
pixels - more accurate than morphological dilation
W =0 W = 1 (3x3) W = 2 (5 x5)
Dilation

camouflage
• Parts of foreground objects visually similar to
background are not detected camouflage

camouflage
• “Solution”: Camouflage reduced by applying
process to RGB color channels and ORing
binary masks
Gray
levels Color

camouflage
• “Solution”: Camouflage reduced by applying
process to RGB color channels and ORing
binary masks
Gray
levels Color

ghosts (1)
• Small, progressive variations of background are
not supported (e.g., daylight changes)
generating false foreground objects ghosts(1)
• Solution: Progressive update of background

image background maintenance/update

ghosts (2)
• Objects that appear in the background image
are detected as foreground objects if they move
ghosts (2)
• Solution background maintenance/update

after predetermined timeout

ghosts (3)
• Background image can not contain moving
objects, since they would become ghosts (3)
• Solution Hot start (restart)

shadows and reflections
• Shadows and reflections detected as
foreground objects Real objects are
deformed, thus increasing complexity of
further analysis stages
• Solution Explicit suppression of shadows

and reflections
slight variations of background
• Slight “periodic” variations of background
detected as foreground objects
▪tree leaves, blinking lights, camera vibration, …
• Solution Robust background models

▪ Tolerant to noise and small fluctuations (automatic
adjustment of parameters)
• Parametric models: standard probability distributions
• Non-parametric models: do not require standard pdf
probability distributions
Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Background model update
• Background model must be updated

dynamically:
▪To accommodate progressive variations of
background (e.g., daylight changes)
▪To suppress ghosts and stationary objects

• Progressive adaptation of background model:

First frame

First frame


▪Background image updated with part of current
image for pixels not belonging to the foreground
mask selective running average
BKGt +1  x, y  =  Imaget  x, y  + (1 −  ) BKGt  x, y  Foret  x, y  = 0
▪Parameter  determines adaptation speed

(e.g., = 0.05)

 = 0.002  = 0.003  = 0.004

 = 0.002  = 0.003  = 0.004

• Suppression of ghosts (2) - stationary objects

that appear or are removed:
▪Counter per pixel
▪Counter incremented every time pixel is
detected as foreground in current image
▪Counter reset every time pixel is detected as
background in current image
▪Pixel “incorporated” to background model if
counter reaches a threshold

• Suppression of ghosts and stationary objects:
Background
image
Threshold
= 25

Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Hot start
• Background model must not contain moving

objects (they would become ghosts(3))

Hot start

▪Initial background model First image
▪ all pixels unlocked
▪Counter per pixel
▪Until all pixels locked (may be forever ;) )
▪ Counter incremented/reset every time pixel is
detected as background/foreground in current
image
▪ Pixel background model updated (e.g., RA) while
background counter below a threshold ->
Locked when threshold reached
Hot start

No bkg update
Threshold = 5 * 25

Hot start

Locked
pixels
Threshold
= 5 * 25

Hot start

Locked
pixels
Threshold
= 5 * 25

Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Shadow detection


Shadow detection

• Shadow Darkening of background region
without altering its chromaticity (only if white
illumination) or texture
• Reflection Lightening of background
region without altering its chromaticity (only if
white illumination) or texture

Shadow detection
• Shadows detectable provided:

▪ They are cast by moving objects
▪ Background model is available
• Features for detecting (moving) cast shadows:
▪ Intensity
▪ Chromaticity
▪ Physical properties
▪ Geometry
▪ Texture
▪ Temporal features

Shadow detection
• Intensity
▪ Regions under shadow become darker
▪ Maximum darkness bounded by ambient
illumination
▪ Possible to predict range of intensity reduction
under shadow Rejection of non-shadow
pixels
▪ Useful as a pre-processing

Shadow detection
• Chromaticity
▪ Color constancy ≡ linear attenuation
Regions under shadow become darker but retain
chromaticity (measure of color independent of
intensity) if illumination is white light
▪ Methods use color spaces with better separation
between chromaticity and intensity than RGB
(e.g., HSV, YUV…)
▪ Comparisons at the pixel level Sensitive to
noise
▪ Strong illumination changes and strong
shadows affect chromaticity
Shadow detection
• Physical properties
▪ Outdoor scenes have two natural illumination
sources:
▪ Sun white light (predominant)
▪ Sky blue light
▪ Chromaticity shifted to blue spectrum if direct
sunlight is blocked non-linear attenuation
models
▪ Non-linear attenuation models can be learned
by training

Shadow detection
• Geometry
▪ Orientation, size and shape of shadows can be
predicted with proper knowledge of:
▪ Illumination source
▪ Object shape (e.g., vehicles,
standing people…)
▪ Ground surface
▪ Background model unnecessary
▪ Methods typically assume:
▪ Different orientation of objects and their shadows
▪ Single light source
▪ Flat ground
Shadow detection
• Texture
▪ Regions under shadow retain most of their
texture
▪ Texture of shadowed regions correlates with
texture of background image
▪ Texture correlation:
▪ Normalized cross-correlation
▪ Gradient correlation
▪ Gabor filtering
▪ Markov random fields
▪ Independence from color and illumination

Shadow detection
• Temporal features
▪ Shadows share the same motion pattern as
their generating objects
▪ Useful as a post-processing Pixels over
detected shadow regions must be consistent
in time

Shadow detection
Chromaticity-based method
• Chromaticity-based method
R. Cucchiara, C. Grana, M. Piccardi, A. Prati, Detecting moving objects, ghosts,
and shadows in video streams, IEEE Transactions on Pattern Analysis and
Machine Intelligence 25 (10) (2003) 1337–1342. (section 3)
▪ Pixels corresponding to shadows or reflections

suppressed after detecting foreground objects
(post-processing)
▪ Every pixel from the current image compared
to background model:
▪ Shadow Chromaticity similar to background
but lower intensity
▪ Reflection Chromaticity similar to
background but higher intensity

Shadow detection
▪ Color space with separation between
chromaticity and intensity (e.g., HSV):
Imaget  x, y  = ( IHt  x, y , ISt  x, y , IVt  x, y )
BKGt  x, y  = ( BHt  x, y , BSt  x, y , BVt  x, y )
Hue Saturation Intensity

Shadow detection
▪ Shadow detection Imaget  x, y  = ( IHt  x, y , ISt  x, y , IVt  x, y )
𝐼𝑉𝑡 𝑥, 𝑦
𝛼≤ ≤ 𝛽 ∧ 𝐼𝑆𝑡 𝑥, 𝑦 − 𝐵𝑆𝑡 𝑥, 𝑦 ≤ 𝜏𝑆 ∧ 𝐷𝐻 ≤ 𝜏𝐻 ⇒ 𝑆ℎ𝑎𝑑𝑜𝑤𝑡 𝑥, 𝑦 = 1
𝐵𝑉𝑡 𝑥, 𝑦
𝐷𝐻 = min 𝐼𝐻𝑡 𝑥, 𝑦 − 𝐵𝐻𝑡 𝑥, 𝑦 , 360 − 𝐼𝐻𝑡 𝑥, 𝑦 − 𝐵𝐻𝑡 𝑥, 𝑦
(e.g., 𝛼 = 0.5, 𝛽 = 0.9)
▪ Thresholds determined empirically

Shadow detection
▪ Reflection detection Imaget  x, y  = ( IHt  x, y , ISt  x, y , IVt  x, y )
𝐵𝑉𝑡 𝑥, 𝑦
𝛼≤ ≤ 𝛽 ∧ 𝐼𝑆𝑡 𝑥, 𝑦 − 𝐵𝑆𝑡 𝑥, 𝑦 ≤ 𝜏𝑆 ∧ 𝐷𝐻 ≤ 𝜏𝐻 ⇒ 𝑅𝑒𝑓𝑙𝑒𝑐𝑡𝑖𝑜𝑛𝑡 𝑥, 𝑦 = 1
𝐼𝑉𝑡 𝑥, 𝑦
𝐷𝐻 = min 𝐼𝐻𝑡 𝑥, 𝑦 − 𝐵𝐻𝑡 𝑥, 𝑦 , 360 − 𝐼𝐻𝑡 𝑥, 𝑦 − 𝐵𝐻𝑡 𝑥, 𝑦
(e.g., 𝛼 = 0.5, 𝛽 = 0.9)
▪ Thresholds determined empirically.

Shadow detection
Images borrowed from “Shadow Detection Algorithms for Traffic Flow Analysis: a Comparative Study”, A.Prati, I.Mikic, C.Grana, M. M. Trivedi, ITCS 2001

Shadow detection
 = 0.5,  = 0.9, S = 0.1, H = 36
 = 0.5,  = 0.6, S = 0.1, H = 36  = 0.5,  = 0.9, S = 0.05, H = 18

Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Robust background models
▪Unimodal models (Gaussian)
▪ C.R. Wren et al., “Pfinder Real-Time Tracking of the Human
Body”, IEEE Trans. on Pattern Analysis and Machine
Intelligence, 19(7):708-785, July 1997
▪Multimodal models (Mixture of Gaussians)

▪ C. Stauffer, W.E.L. Grimson, “Adaptive background mixture
models for real-time tracking”, in Proc. of IEEE Computer
Vision and Pattern Recognition, June 1999, vol.2, pp. 246-252
• Non-parametric models (KDE)

▪ A. Elgammal, D. Harwood, L. Davis, “Non-parametric Model
for Background Subtraction”, in Proc. of 6th European
Conference on Computer Vision, June 2000, pp. 751-767
Parametric models
• Unimodal model (Gaussians)
▪Background model (Gaussian distribution for
each pixel) Mean and standard deviation
per pixel t  x, y  , t  x, y 
Probability
density  t  x, y 
Gray level
t  x, y 

Parametric models
Probability
density 3 t  x, y  FOREGROUND
Gray level
Foret  x, y  = 1  Imaget  x, y 

Parametric models
Probability
density 3 t  x, y  BACKGROUND
Gray level

Parametric models
▪Initial values estimated from initial sequence of

scene images with no moving objects
▪Parameters updated for background pixels
(running average): selective update
Foret  x, y  = 0  t +1  x, y  =  Imaget  x, y  + (1 −  ) t  x, y 
 t2+1  x, y  =  ( Imaget  x, y  − t  x, y ) + (1 −  ) t2  x, y 
2
(e.g.:  = 0.05)

Parametric models
▪Advantages
▪ Simple and efficient
▪ Low memory requirements (mean & variance per pixel)
▪ Automatic adaptation to noise (variance)
▪ Intuitive constants
▪ Adaptation to progressive changes of background
(mean & variance, RA)
▪Drawbacks
▪ Only Gaussian noise supported
▪ Background flickering not supported (leaves, blinking
lights, …)
Parametric models
• Multimodal model (Mixture of Gaussians)
▪Background model (K Gaussian distributions
per pixel; K between 3 and 5) Mean,
deviation and weight per Gaussian
i ,t  x, y , i ,t  x, y ,i ,t  x, y 
 2,t  x, y  2,t  x, y 
Probability
density  1,t  x, y   3,t  x, y 
1,t  x, y  3,t  x, y 
1,t  x, y  2,t  x, y  3,t  x, y  Gray level

Parametric models
▪Weight (normalized) percentage of times
pixel has had gray/color levels within the
Gaussian’s interval
▪ Gaussians ordered by weights (decreasing)
• weight/deviation ratio in original paper
▪ Background model: group of B “upper”
Gaussians summing weights up to Wth [T in
original paper] (e.g., 0,8)
• high Wth => multimodal background
• low Wth => monomodal background
▪ Background pixel if “belonging” to B “upper”
Gaussians
Parametric models
▪Initialization
▪ one Gaussian per pixel from first image (*)
• Initial mean Pixel value
• Large initial deviation
• Initial weight = 1 (normalized)
▪ one Gaussian (pixel from first image) and K-1
with random mean
• the pixel from first image with less deviation and
higher weight
▪ all random without using first image

Parametric models
▪Pixels don’t belong to background gaussians => fg
FOREGROUND
 2,t  x, y  2,t  x, y 
Probability
density  1,t  x, y   3,t  x, y 
1,t  x, y  3,t  x, y 
Gray level

Parametric models
▪Pixels belong to background gaussians: Wth high
BACKGROUND
 2,t  x, y  2,t  x, y 
Probability
density  1,t  x, y   3,t  x, y 
1,t  x, y  3,t  x, y 
Gray level

Parametric models
▪Pixels belong to background gaussians: Wth lower
FOREGROUND BACKGROUND
 2,t  x, y  2,t  x, y 
Probability
density  1,t  x, y   3,t  x, y 
1,t  x, y  3,t  x, y 
Gray level

Parametric models
▪Selective update: Foreground pixel
▪ (once the K Gaussians exist) Gaussian with least
weight substituted for Gaussian centered at pixel’s
value, with low weight and high deviation
• No further details …
» weigth equal or less (half?) than substituted Gaussian
▪ Renormalize weights
▪Problem
• If suddenly a foreground “pixel” appears and stays
fixed “forever”, it will stay erroneously as foreground
▪Solution: “Blind” update of Mixture of Gaussians
Thanks to Andrija Gajic and Pablo Ramírez for raising point!

Parametric models
▪“Blind” update: Foreground pixel
▪ If pixel does not belong to an existing Gaussian
• update as if foreground selective update (see
previous slide) => create new Gaussian
▪ If pixel belongs to an existing Gaussian
• update as if background pixel (see next slide) =>
Gaussians RA update

Parametric models
▪Selective/blind update: Background pixel
▪ Update parameters of selected Gaussian(s)
(running average)   x, y =  Image  x, y  + (1 −  )   x, y 
t +1 t t
  x, y  =  ( Image  x, y  −   x, y ) + (1 −  )  x, y 
2 2 2
t +1 t t t
▪ Increase weight of selected Gaussian(s)

▪ Decrease weights of other Gaussians
𝑤𝑘,𝑡 = 1 − 𝛼 𝑤𝑘,𝑡 + 𝛼 𝑀𝑘,𝑡
𝑀𝑘,𝑡 = 1/0 (matched/not matched)
▪ Renormalize weights

Parametric models
▪ Selective update - Background pixel (Wth = 0.9)
Probability
density
Gray level
Image0  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
density
 1,0  x, y  1,0  x, y  = 1
Gray level
Image0  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
density
 1,0  x, y  1,0  x, y  = 1
Gray level
Image1  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
 1,1  x, y  1,1  x, y  = 1
density
Gray level
Image1  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
 1,1  x, y  1,1  x, y  = 1
density
Gray level
Image2  x, y 

Parametric models
▪Bkg Gaussians: G
 1,2  x, y  1,2  x, y  = 1
Probability
density
Gray level
Image2  x, y 

Parametric models
▪ Selective update - Foreground pixel (Wth = 0.9)
▪Bkg Gaussians: G
 1,2  x, y  1,2  x, y  = 1
Probability
density
Gray level
Image3  x, y 

Parametric models
▪Bkg Gaussians: G
 1,3  x, y  1,3  x, y  = 0,95
Probability
density
2,3  x, y  = 0,05
 2,3  x, y 
Gray level
Image3  x, y 

Parametric models
▪Bkg Gaussians: G
 1,3  x, y  1,3  x, y  = 0,95
Probability
density
2,3  x, y  = 0,05
 2,3  x, y 
Gray level
Image4  x, y 

Parametric models
▪Bkg Gaussians: G
 1,4  x, y  1,4  x, y  = 0,90
Probability   x, y  = 0,10
2,4
density  x, y
2,4  
Gray level
Image4  x, y 

Parametric models
▪Bkg Gaussians: G
 1,4  x, y  1,4  x, y  = 0,90
2,4
density  x, y
2,4  
Gray level
𝐼𝑚𝑎𝑔𝑒5 𝑥, 𝑦

Parametric models
▪Bkg Gaussians: G/R
𝜔2,5 𝑥, 𝑦 = 0,15 𝜎1,5 𝑥, 𝑦 𝜔1,5 𝑥, 𝑦 = 0,85
Probability
density 𝜎2,5 𝑥, 𝑦
Gray level

Parametric models
𝜔2,5 𝑥, 𝑦 = 0,15 𝜎1,5 𝑥, 𝑦 𝜔1,5 𝑥, 𝑦 = 0,85
Probability
Gray level

Parametric models
𝜔2,6 𝑥, 𝑦 = 0,20 𝜎 𝑥, 𝑦 𝜔1,6 𝑥, 𝑦 = 0,80
1,6
Probability 𝜎2,6 𝑥, 𝑦
density
Gray level

Parametric models
𝜔2,6 𝑥, 𝑦 = 0,20 𝜎 𝑥, 𝑦 𝜔1,6 𝑥, 𝑦 = 0,80
1,6
density
Gray level
𝐼𝑚𝑎𝑔𝑒6 𝑥, 𝑦 𝐼𝑚𝑎𝑔𝑒7 𝑥, 𝑦

Parametric models
𝜔2,7 𝑥, 𝑦 = 0,17 𝜎 𝑥, 𝑦 𝜔1,7 𝑥, 𝑦 = 0,78
1,7
density
𝜔3,7 𝑥, 𝑦 = 0,05
𝜎3,7 𝑥, 𝑦
Gray level

Parametric models
𝜔2,7 𝑥, 𝑦 = 0,17 𝜎 𝑥, 𝑦 𝜔1,7 𝑥, 𝑦 = 0,78
1,7
density
𝜔3,7 𝑥, 𝑦 = 0,05
𝜎3,7 𝑥, 𝑦
Gray level

Parametric models
▪Bkg Gaussians: G/R/B
𝜔2,8 𝑥, 𝑦 = 0,14 𝜎 𝑥, 𝑦 𝜔1,8 𝑥, 𝑦 = 0,75
1,8
density
𝜔3,8 𝑥, 𝑦 = 0,11
𝜎3,8 𝑥, 𝑦
Gray level

Parametric models
𝜔2,8 𝑥, 𝑦 = 0,14 𝜎 𝑥, 𝑦 𝜔1,8 𝑥, 𝑦 = 0,75
1,8
density
𝜔3,8 𝑥, 𝑦 = 0,11
𝜎3,8 𝑥, 𝑦
Gray level

Parametric models
𝜔2,9 𝑥, 𝑦 = 0,13 𝜎 𝑥, 𝑦 𝜔1,9 𝑥, 𝑦 = 0,74
1,9
density 𝜔3,9 𝑥, 𝑦 = 0,13
𝜎3,9 𝑥, 𝑦
Gray level

Parametric models
𝜔2,9 𝑥, 𝑦 = 0,13 𝜎 𝑥, 𝑦 𝜔1,9 𝑥, 𝑦 = 0,74
1,9
𝜎3,9 𝑥, 𝑦
Gray level

Parametric models
𝜔2,9 𝑥, 𝑦 = 0,14 𝜎 𝑥, 𝑦 𝜔1,9 𝑥, 𝑦 = 0,75
1,9
density
Gray level

Parametric models
𝜔2,10 𝑥, 𝑦 = 0,16 𝜎
1,10 𝑥, 𝑦 𝜔1,10 𝑥, 𝑦 = 0,79
density
𝜔3,10 𝑥, 𝑦 = 0,05
𝜎3,10 𝑥, 𝑦
Gray level

Parametric models
𝜔2,11 𝑥, 𝑦 = 0,16 𝜎
1,11 𝑥, 𝑦 𝜔1,11 𝑥, 𝑦 = 0,79
density
𝜔3,11 𝑥, 𝑦 = 0,05
𝜎3,11 𝑥, 𝑦
Gray level

Parametric models
𝜔2,12 𝑥, 𝑦 = 0,16 𝜎
1,12 𝑥, 𝑦 𝜔1,12 𝑥, 𝑦 = 0,79
density
𝜔3,12 𝑥, 𝑦 = 0,05
𝜎3,12 𝑥, 𝑦
Gray level

Parametric models
AND SO ON …

Parametric models
𝜔2,𝑁 𝑥, 𝑦 = 0,16 𝜎1,𝑁 𝑥, 𝑦 𝜔1,𝑁 𝑥, 𝑦 = 0,79
Probability 𝜎2,𝑁 𝑥, 𝑦
density
𝜔3,𝑁 𝑥, 𝑦 = 0,05
𝜎3,𝑁 𝑥, 𝑦
Gray level
𝐼𝑚𝑎𝑔𝑒𝑁 𝑥, 𝑦

Parametric models
▪ Blind update - Background pixel (Wth = 0.9)
Probability
density
Gray level
Image0  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
density
 1,0  x, y  1,0  x, y  = 1
Gray level
Image0  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
density
 1,0  x, y  1,0  x, y  = 1
Gray level
Image1  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
 1,1  x, y  1,1  x, y  = 1
density
Gray level
Image1  x, y 

Parametric models
▪Bkg Gaussians: G
Probability
 1,1  x, y  1,1  x, y  = 1
density
Gray level
Image2  x, y 

Parametric models
▪Bkg Gaussians: G
 1,2  x, y  1,2  x, y  = 1
Probability
density
Gray level
Image2  x, y 

Parametric models
▪ Blind update - Foreground pixel (Wth = 0.9)
▪Bkg Gaussians: G
 1,2  x, y  1,2  x, y  = 1
Probability
density
Gray level
Image3  x, y 

Parametric models
▪Bkg Gaussians: G
 1,3  x, y  1,3  x, y  = 0,95
Probability
density
2,3  x, y  = 0,05
 2,3  x, y 
Gray level
Image3  x, y 

Parametric models
▪Bkg Gaussians: G
 1,3  x, y  1,3  x, y  = 0,95
Probability
density
2,3  x, y  = 0,05
 2,3  x, y 
Gray level
Image4  x, y 

Parametric models
▪Bkg Gaussians: G
 1,4  x, y  1,4  x, y  = 0,90
2,4
density  x, y
2,4  
Gray level
Image4  x, y 

Parametric models
▪Bkg Gaussians: G
 1,4  x, y  1,4  x, y  = 0,90
2,4
density  x, y
2,4  
Gray level

Parametric models
𝜔2,5 𝑥, 𝑦 = 0,15 𝜎1,5 𝑥, 𝑦 𝜔1,5 𝑥, 𝑦 = 0,85
Probability
Gray level

Parametric models
𝜔2,5 𝑥, 𝑦 = 0,15 𝜎1,5 𝑥, 𝑦 𝜔1,5 𝑥, 𝑦 = 0,85
Probability
Gray level

Parametric models
𝜔2,6 𝑥, 𝑦 = 0,20 𝜎 𝑥, 𝑦 𝜔1,6 𝑥, 𝑦 = 0,80
1,6
density
Gray level

Parametric models
𝜔2,6 𝑥, 𝑦 = 0,20 𝜎 𝑥, 𝑦 𝜔1,6 𝑥, 𝑦 = 0,80
1,6
density
Gray level
𝐼𝑚𝑎𝑔𝑒6 𝑥, 𝑦 𝐼𝑚𝑎𝑔𝑒7 𝑥, 𝑦

Parametric models
𝜔2,7 𝑥, 𝑦 = 0,17 𝜎 𝑥, 𝑦 𝜔1,7 𝑥, 𝑦 = 0,78
1,7
density
𝜔3,7 𝑥, 𝑦 = 0,05
𝜎3,7 𝑥, 𝑦
Gray level

Parametric models
𝜔2,7 𝑥, 𝑦 = 0,17 𝜎 𝑥, 𝑦 𝜔1,7 𝑥, 𝑦 = 0,78
1,7
density
𝜔3,7 𝑥, 𝑦 = 0,05
𝜎3,7 𝑥, 𝑦
Gray level

Parametric models
𝜔2,7 𝑥, 𝑦 = 0,17 𝜎 𝑥, 𝑦 𝜔1,7 𝑥, 𝑦 = 0,78
1,7
density
𝜔3,7 𝑥, 𝑦 = 0,05
𝜎3,7 𝑥, 𝑦
Gray level

Parametric models
𝜔2,8 𝑥, 𝑦 = 0,15 𝜎 𝑥, 𝑦 𝜔1,8 𝑥, 𝑦 = 0,76
1,8
density
𝜔3,8 𝑥, 𝑦 = 0,09
𝜎3,8 𝑥, 𝑦
Gray level

Parametric models
𝜔2,8 𝑥, 𝑦 = 0,15 𝜎 𝑥, 𝑦 𝜔1,8 𝑥, 𝑦 = 0,76
1,8
density
𝜔3,8 𝑥, 𝑦 = 0,09
𝜎3,8 𝑥, 𝑦
Gray level

Parametric models
𝜔2,9 𝑥, 𝑦 = 0,13 𝜎 𝑥, 𝑦 𝜔1,9 𝑥, 𝑦 = 0,75
1,9
𝜎3,9 𝑥, 𝑦
Gray level

Parametric models
𝜔2,9 𝑥, 𝑦 = 0,13 𝜎 𝑥, 𝑦 𝜔1,9 𝑥, 𝑦 = 0,75
1,9
𝜎3,9 𝑥, 𝑦
Gray level

Parametric models
𝜔2,10 𝑥, 𝑦 = 0,12 𝜎 𝑥, 𝑦 𝜔1,10 𝑥, 𝑦 = 0,74
1,10
𝜎3,10 𝑥, 𝑦
Gray level

Parametric models
BUT, OF COURSE, BLIND UPDATE HAS

OTHER PROBLEMS …
… ALWAYS REQUIRED TO TAKE
ENGINEERING DECISSIONS KNOWING HOW
METHODS WORK

Parametric models
▪Advantages
▪ Flickering supported (leaves, blinking lights, …)
▪ Introducing and removing objects supported
▪ Efficient
▪ Automatic adaptation to image noise (variances)
▪ Adaptation to progressive background changes (RA)
▪Drawbacks
▪ Only Gaussian noise supported
▪ Sudden, non-periodic variations not supported
• Selective update (blind update does)
Non-parametric models
• Kernel Density Estimate
▪Background model FIFO queue with the latest
N values of every pixel (selective or blind update)
Image  x, y,
t −1 , Imaget − N  x, y  (e.g., N = 100)
Probability
density
I8 I 4 I5 I1 I2 I3 I6 I7 Gray level

▪Set of Gaussians centered at every stored value
  x, y  =
mediani =1 N −1 ( Image  x, y  − Image  x, y  )
t −i t −i −1
0,68 2
Probability
density
1  x − Imaget − i  x , y  
−  
2
Gray level
 ( x, Imaget −i  x, y  , ) =
1 2  
e
 2

▪Density of samples estimated as average of
Gaussians evaluated at the current pixel value
Probability
density High density BACKGROUND
Gray level
Imaget  x, y 

▪Density of samples estimated as average of
Gaussians evaluated at the current pixel value
Probability
density Low density FOREGROUND
Gray level
Imaget  x, y 

▪Advantages
▪ Flickering supported (leaves, blinking lights, …)
▪ Automatic adaptation to image noise and to
sudden and progressive background changes
▪Drawbacks
▪ High memory requirements (N samples per pixel)
▪ High computational cost
• Optimization
A. Elgammal et al., “Efficient Kernel Density Estimation Using the Fast
Gauss Transform with Application to Color Modeling and Tracking”,
IEEE Trans. on PAMI, 25(11):1499-1504, Nov. 2003

Contents
▪ Introduction
▪ Basic algorithm
▪ Hot start
Contents
▪ Foreground segmentation Credits
Some slides are edited from original
▪ Introduction slides and material contributed by
Miguel Ángel García
miguelangel.garcia@uam.es
▪ Performance evaluation Juan Carlos San Miguel
juancarlos.sanmiguel@uam.es
▪ Basic algorithm
▪ Hot start
Contents
▪ Object detection and classification
Contents
▪ Introduction
▪ Methods based on foreground segmentation
▪ Methods not based on foreground segmentation
▪ Concluding remarks
© 2023 (josem.martinez@uam.es) – AVSA II.2 153 FgSegm&ObjDetect: Object detection and classification
Introduction
• Object detection and classification

▪aka “object recognition” (pattern recognition)
▪features vs objects
▪ Machine learning vs Human reasoning
▪Required to
▪ Scene understanding
• number of persons, luggage-person distance
(abandoned objects monitoring), …
▪ Query previously-recorded situations
• person with blue luggage (forensic surveillance),
person with dog at the park today (life-logging for
helping Alzheimer patients)
Introduction

▪ Detection → extract moving and static objects (candidates)
▪ Classification → assign labels to each object (classes)
AVSS2007 - ftp://motinas.elec.qmul.ac.uk/pub/iLids/ PETS2001 - http://ftp.pets.rdg.ac.uk/PETS2001/
Group Person Vehicle Object Train

Introduction

▪Scenarios
Crowded
scenes
Sparse
scenes
Single camera Multiple cameras
Introduction

▪Candidates selection before classification
▪ Methods based on foreground/backgroung
segregation
• Blob detection, filtering and classification
▪Candidates selection “integrated” in classification
phase
▪ Methods not based on foreground/background
segregation
• Models are used for scanning the whole frame
(image) obtaining a map of “objectness”
• Object proposal networks (Deep Learning Workshop)
Introduction
• Object (candidates) classification

▪Training pase Training
Labels
Training Images
Image Learned
Training
Features model
▪Test phase
Learned
model Class A
Prediction
Not Class A
Image Test
Features model
Test Image
Introduction
• Where are we?

▪Incredible progress in the last fifteen years
▪Better features, models, learning methods,

datasets,… Source: Comparative Performance Analysis of Feature Sets for Abandoned
Object Classification, 2008
▪… and …
▪Deep learning…
Introduction
• Where are we?

… but training is AN issue …
… among others …
… but DL is here to STAY …
▪Deep learning…
Contents
▪ Introduction
• Objective & challenges
• Blob extraction
• Blob classification
• Limitations
Methods based on fg segmentation
Objective and challenges
• Foreground segmentation serves as a focus of

attention method but does not provide
▪ Number of objects in a region of interest
▪ Type of object or objects in the region of interest
CDW2012 - http://www.changedetection.net/
Objective
Challenges
▪Imperfect segmentation
▪ Illumination changes and
shadows, Camouflage, Noise, …
▪Arbitrary camera views
▪Projective image distortion
(cameras with large field-of-view)
▪Low-resolution imagery
(objects < 100 pixels in height)
▪Groups of people may “look” like
cars
Canonical approach
• We assume static cameras (to use fg segmentation)

• Four main steps:
Blob
Labels
Foreground segmentation classification
Video Blob
Background Post- extraction
subtraction processing Blob IDs
tracking
Input image Foreground segmentation Extracted blobs
Labels: people
Blob Extraction
• BLOB - Binary Large OBject

▪Group of connected pixels in a binary image
▪“Large” indicates that only objects of a certain size
are of interest and that “small” binary objects are
usually noise (better remove before Blob extraction)
• Blob extraction
• Connected component analysis
4-connectivity 8-connectivity
Blob Extraction
• ‘Sequential Grass-fire’ CCA algorithm

From “Introduction to Image and Video Processing”, Th.B. Moeslund, Springer, 2012
▪ Two images: input (binary, 0 – 255) and output

(IDs, 1-…)
1. Scan the image from top-left to bottom-right
2. Initialize ID count to zero
3. When an object pixel (255) is found:
a. Increase ID count
b. Set pixel in the output image to the current ID count
c. Set pixel in the input image to zero
d. Check neighbors and repeat b-d (if any equal to 255)
4. Repeat 3 until end of image
Blob Extraction
ID for blob extraction, label for classification
----
Blob Extraction
Binary
image
Result
(IDs=colors)
Blob Extraction
• Blob extraction + ‘size filtering’

Current Image Background Foreground Mask
IDs on binary image Initial Bounding Boxes Bounding Boxes Filtered

Blob Classification
• Blob classification - features

▪Area
▪Bounding Box
▪Bounding Circle
▪Convex Hull
▪Aspect ratio
▪Center of mass
▪Speed
▪Motion direction
▪Blob location
▪… From “Introduction to Image and Video Processing”, Th.B. Moeslund, Springer, 2012
Blob Classification
• Blob classification - features

▪…
▪Compactness (percentage of occupancy inside
the bounding box) Compactness = 𝐴𝑟𝑒𝑎 𝑜𝑓 𝐵𝐿𝑂𝐵
𝑤𝑖𝑑𝑡ℎ ∙ ℎ𝑒𝑖𝑔ℎ𝑡
𝑃𝑒𝑟𝑖𝑚𝑒𝑡𝑒𝑟 𝑜𝑓 𝐵𝐿𝑂𝐵
▪Circularity Circularity =
2 𝜋 ∙ 𝐴𝑟𝑒𝑎 𝑜𝑓 𝐵𝐿𝑂𝐵
▪Direction of motion with respect to major axis
direction (cars tend to move along the major
axis direction)
▪Shape Deformation (people tend to have larger
shape deformations than cars when moving)
▪….
Blob Classification
• Blob classification – features
▪ After feature extraction, a binary image is

converted into a number of feature values for each
blob
▪ The feature values are grouped in the feature vector
Blob Classification
• Blob classification – modeling

• Class-distinctive features
• Feature space (decision region)
Which blob is a circle?
Which blob is a large circle?
Blob Classification
• Blob classification – modeling

▪Not possible to define the prototype model
beforehand → learning
• Example: gaussian modeling
Feature vector
𝑓Ԧ = {𝑓1 , 𝑓2 , … , 𝑓𝑛 }
Feature model
(Gaussian)
𝑓𝑚 = {𝑓𝑚1 , … , 𝑓𝑚𝑛 }
𝑓𝑚𝑖 = {𝜇𝑖 , 𝜎𝑖 } From “Introduction to Image and Video Processing”, Th.B. Moeslund, Springer, 2012
Blob Classification
• Blob classification – classifier

▪Box-classifier
▪ Falls within boundaries of decision region → binary
▪ Very simple → not used (many outliers)
▪Statistical-classifier
▪ Check distance between feature vector and model
▪ Multi-class problem (closed number of classes)
• N-classifiers → Maximum a posteriori criteria
▪ Single-class problem (with unknown classes)
• Distance is thresholded for recognizing the class
▪ Many other classifiers (SVM, NN, …)
Blob Classification
• Blob classification – classifier

▪Popular feature distances
Euclidean
Ԧ 𝑓Ԧ𝑚 ) = σ𝑛𝑖=1
𝐸𝐷(𝑓, 𝑓𝑖 − 𝜇𝑚𝑖 2
distance
Weigthed 𝑓𝑖 −𝜇𝑚𝑖 2
euclidean distance
Ԧ 𝑓Ԧ𝑚 ) =
𝑊𝐸𝐷(𝑓, σ𝑛𝑖=1
𝜎 2 𝑚𝑖
Mahalanobis 𝑇
distance
Ԧ 𝑓Ԧ𝑚 ) =
𝑀(𝑓, 𝑓Ԧ − 𝑓Ԧ𝑚 𝑆 −1 𝑓Ԧ − 𝑓Ԧ𝑚
S = covariance matrix
▪Feature normalization
▪ Required for combination
▪ Same feature scale 𝑛𝑜𝑟𝑚𝐴𝑟𝑒𝑎= min 𝐴𝑟𝑒𝑎 𝐵𝐿𝑂𝐵 𝐴𝑟𝑒𝑎 𝑚𝑜𝑑𝑒𝑙
,
𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝐴𝑟𝑒𝑎 𝑚𝑜𝑑𝑒𝑙 𝐴𝑟𝑒𝑎 𝐵𝐿𝑂𝐵
• [0,1]
Limitations
• Limitations of blob-based analysis

• Just considering shape and motion descriptors
from the segmented object (foreground blob) is
limited in order to handle multiple object
classes
→ Incorporate appearance features of the blob
• Depends on accurate blob segmentation
• Large training set is required
→ Incorporate (complex) motion features of the
blob
• Depends on accurate blob segmentation
Limitations
• Crowded scenes:
• Many segmentation errors: ghosts, holes, overlap
of blobs, …
• Foreground segmentation is not useful for object

detection and classification in crowded scenes
Contents
▪ Introduction
• Introduction
• Viola and Jones
• Histogram of Oriented Gradients
• Part-based methods
• Invariance to scale
• Comparative for pedestrian detection
Methods not based on fg segment.
Introduction
• Many types of object are of interest

▪ Develop generic classifiers and adapt to each
object
• Fortunately, in the case of video-surveillance,
the objects of interest are usually pedestrians
(persons), vehicles and other objects (e.g.,
luggage)
Introduction
• Basic component: a binary classifier
Car/non-car
Classifier
No,Yes,
notcar.
a car.
Slide credit: K. Grauman, B. Leibe
Introduction
▪ If object may be in a cluttered scene, slide a

window (at different scales) around looking
for it
Car/non-car
Classifier
Slide credit: K. Grauman, B. Leibe (adapted)
Introduction
• Challenges
▪ Low-resolution imagery (tiny objects)
▪ Difficult to handle multiple object poses or
projective image distortion (camera view)
▪ Training is very expensive (getting samples
and train time)
▪ Intra-class variability
▪ Cars come in a variety of shapes (sedan,
minivan, etc)
▪ People wear different clothes and take different
poses
Introduction
• Challenges
Viola&Jones
• Viola&Jones detector (VJ)

▪ A seminal approach to real-time object detection
▪ Training is slow, but detection is very fast
▪ Key ideas:
▪ Integral images for fast feature evaluation
▪ Rectangle features (Haar features)
▪ Boosting for feature selection
▪ Attentional cascade for fast non-face (negative)
window rejection
P. Viola and M. Jones, ‘Robust real-time face detection’ International journal of computer vision, 57(2):137-154, 2004
[15500+ cites on Google Scholar] (12/02/2018)
Viola&Jones
Static Detector (only appearance)

Dynamic Detector (apperance+motion)
Robust to shadows and low-resolution imagery
HOG
• Histogram of Oriented Gradients (HOG)

▪ Another seminal approach to real-time object detection
▪ Training is slow, but detection is very fast
▪ Works very well for non-deformable objects with
canonical orientations: faces, cars, pedestrians
▪ Key ideas:
▪ Well-engineered robust features
▪ Simple SVM classification
▪ Fast detection
▪ Multi-scale detection
N. Dalal and B. Triggs, ‘Histograms of oriented gradients for human detection’, Int. Conf. on Computer Vision and
Pattern Recognition, pp.886-893, 2005
[21600+ cites on Google Scholar] (12/02/2018)
HOG
Part-based methods
• Part-based models
▪ Mixture of fine-detail models (head, limbs, shoulders, legs,…)
▪ Each model has global template + deformation
▪ Fully trained from bounding boxes alone
→ Less used in video-surveillance due to resolution constraints
Invariance to scale
• Detection at different scales: resizing +

combination
Comparative for pedestrian detection
HOG
VJ
LSVM
(HOG- VJ
parts) faces
Comparative for pedestrian detection
HOG
VJ
LSVM
(HOG- VJ
parts) faces
Contents
▪ Introduction
Concluding remarks
• Methods that rely on moving object segmentation

▪ Strengths
▪ Fast and reliable for static cameras with few objects
▪ Robust in simple cases due to model simplification
▪ Fast prototyping
▪ Weaknesses
▪ Depends on foreground segmentation performance
▪ Sensible to occlusions
▪ Do not work for moving cameras or crowded
scenes, where background subtraction results are
not meaningful
Concluding remarks
• Methods that do not use foreground

segmentation (statistical template approach)
• Strengths
• Applicable to static and moving cameras
• Work very well for non-deformable objects with
canonical orientations: faces, cars, pedestrians
• Useful for crowded scenes and moving
cameras.
• Work better under shadows and lighting
changes.
• Fast detection
Concluding remarks
• Methods that do not use foreground

segmentation (statistical template approach)
• Weaknesses
▪ Not so well for highly deformable objects or
“stuff” (grass, water, sky, …)
• Semantic segmentation …
▪ Not robust to occlusion (but part based models)
▪ Require lots of training data
• Hard to design and tune
• Training is very expensive (collecting samples
and training time)
Contents
• Foreground Segmentation and Object
Detection
▪ Introduction
▪ Methods not based on foreground
segmentation
▪ Concluding remarks Credits
Slides edited from original slides and
material contributed by
Juan Carlos San Miguel
juancarlos.sanmiguel@uam.es
Contents
Joint Master Degree (JMD) in Image Processing and Computer Vision (IPCV)
Applied Video Sequences Analysis

Unit II: Foreground Segmentation and Object
Detection
José M. Martínez Sánchez
JoseM.Martinez@uam.es
Escuela Politécnica Superior Universidad Autónoma de Madrid Video Processing and Understanding Lab

2023 - AVSA - Unit II (20230211)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 - AVSA - Unit II (20230211)

Uploaded by

Copyright:

Available Formats

Joint Master Degree (JMD) in Image Processing and Computer Vision (IPCV)

Applied Video Sequences Analysis

© 2023 (josem.martinez@uam.es) – AVSA II 2 Foreground Segmentation and Object Detection: Contents

• Goal of Foreground Segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 4 FgSegm&ObjDetect: Foreground segmentation

• Previous stage of other video analysis

© 2023 (josem.martinez@uam.es) – AVSA II.1 5 FgSegm&ObjDetect: Foreground segmentation

• Segmentation based on background subtraction

• Segmentation based on motion detection

© 2023 (josem.martinez@uam.es) – AVSA II.1 7 FgSegm&ObjDetect: Foreground segmentation

• Video surveillance typical scenarios:

Stages of Background Subtraction (Bouwmans 2014)

Image contributed by Marcos Escudero (marcos.escudero@uam.es)

© 2023 (josem.martinez@uam.es) – AVSA II.1 9 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 11 FgSegm&ObjDetect: Foreground segmentation

Based on quality criteria about Quantify the deviation of the

© 2023 (josem.martinez@uam.es) – AVSA II.1 12 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 13 FgSegm&ObjDetect: Foreground segmentation

Scenario Complexity Density

© 2023 (josem.martinez@uam.es) – AVSA II.1 14 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 15 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 16 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 18 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 19 FgSegm&ObjDetect: Foreground segmentation

▪ Re (Recall) : TP / (TP + FN)

▪ FPR-S : Average False positive rate in hard shadow areas

▪ TP : True Positive - FP : False Positive - FN : False Negative - TN : True Negative

© 2023 (josem.martinez@uam.es) – AVSA II.1 21 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 22 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 24 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 25 FgSegm&ObjDetect: Foreground segmentation

• First image in the sequence (background

• Background image is subtracted from current

• Absolute differences are compared against

• Pixels detected as “mobile objects” if

Foreground Binary Mask

High threshold Low threshold

© 2023 (josem.martinez@uam.es) – AVSA II.1 34 FgSegm&ObjDetect: Foreground segmentation

 Imaget  x + i, y + j  − BKG  x + i, y + j     Foret  x, y  = 1

© 2023 (josem.martinez@uam.es) – AVSA II.1 35 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 36 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 37 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 38 FgSegm&ObjDetect: Foreground segmentation

© 2023 (josem.martinez@uam.es) – AVSA II.1 39 FgSegm&ObjDetect: Foreground segmentation

• Solution: Progressive update of background

© 2023 (josem.martinez@uam.es) – AVSA II.1 40 FgSegm&ObjDetect: Foreground segmentation

• Solution background maintenance/update

© 2023 (josem.martinez@uam.es) – AVSA II.1 41 FgSegm&ObjDetect: Foreground segmentation

• Solution Hot start (restart)

© 2023 (josem.martinez@uam.es) – AVSA II.1 42 FgSegm&ObjDetect: Foreground segmentation

• Solution Explicit suppression of shadows

• Solution Robust background models

• Background model must be updated

© 2023 (josem.martinez@uam.es) – AVSA II.1 46 FgSegm&ObjDetect: Foreground segmentation

• Progressive adaptation of background model:

© 2023 (josem.martinez@uam.es) – AVSA II.1 47 FgSegm&ObjDetect: Foreground segmentation

• Progressive adaptation of background model:

© 2023 (josem.martinez@uam.es) – AVSA II.1 48 FgSegm&ObjDetect: Foreground segmentation

• Progressive adaptation of background model:

© 2023 (josem.martinez@uam.es) – AVSA II.1 49 FgSegm&ObjDetect: Foreground segmentation

• Progressive adaptation of background model:

▪Parameter  determines adaptation speed

© 2023 (josem.martinez@uam.es) – AVSA II.1 50 FgSegm&ObjDetect: Foreground segmentation