You are on page 1of 39

This Lecture

Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

Slide Credits:
A. Efros, S. Palmer, B. Leibe, S. Lazebnik, K. Grauman, S. Seitz, C.Bishop, .
Kokkinos

Mid-level vision
Half-way between the image and the objects

Something more informative than pixels


Superpixels,
Ren & Malik

But not necessarily object-centered : Generic

`Blue SegmentV. Kandinsky


2

Why not go directly from image to objects?

Pattern recognition task

Mid-level representations suffice for recognition

Scalability: objects and their geons

Hypothesis: there is a small number of geometric components that


constitute the primitive elements of the object recognition system
Analogy: using letters to form words (compare with Ideograms)

Scalability: Recognition-by-components

1) We know that this object is nothing we know


2) We can split this objects into parts that
everybody will agree
3) We can see how it resembles something
familiar: a hot dog cart

Mid-level vision
How can we abstract from the image observations?
Too many pixels, edgels, blobs, junctions
Replace with representative, higher-level structures
Fewer and amenable to subsequent processing
Core problem: Grouping
Region grouping (Segmentation)

The Gestalt School


The whole is greater than the sum of its parts

Properties result from relationships

Illusory/subjective
contours

Occlusion

Familiar configuration

Relationships are recovered using a few generic cues

Similarity

Common Fate

10

Proximity

11

Symmetry

12

Parallelism

13

Continuity

Gestalt theory and computer vision


Gestalt heritage: mostly conceptual
Turning Gestalt cues into numerical quantities:

Common fate:
Parallelism:
Symmetry:
Similarity:
Closure, continuity:

motion estimation
texture analysis
ridge detection
region-based segmentation
boundary-based segmentation

Main problem: how do we choose when to rely on each


of these cues?
15

Cue combination problem


Different cues lead to different segmentations

Symmetry

Continuity

Color Similarity
16

Image segmentation is an ill-posed problem


No `optimal segmentation exists

`Good segmentation: highly task-dependent


Depth Ordering

Motion Estimation

This Lecture
Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

18

Segmentation Problem
Task: Partition image into homogeneous regions

Homogeneity: based on intensity, color, texture,


motion, depth, shading .

Intensity:

19

Feature Space
At each pixel, form a vector of measurements describing
image properties: image features
Map observations into feature space
Group pixels based
on color similarity

R=255
G=200
B=250

B
R=245
G=220
B=248

R=15
G=189
B=2

Feature space: color value (3D)

R=3
G=12
B=2
20

10

Feature Space
At each pixel, form a vector of measurements describing
image properties: image features
Perform segmentation by clustering the feature space
Grouping pixels based
on texture similarity
F1
F2

Filter bank
of 24 filters

F24

Feature space: filter bank responses (e.g., 24D)

21

Grouping: Inventing underlying models

What if these
points lie on lines?

Line 1
Line 2

11

As we dont know the lines

Line weights
line 1
line 2
Iteration

Application to segmentation
Invented `models: image segments
Modeling the features separately within each segment:
substantially easier than modeling the image.
Brown

Blue

Yellow

24

12

Intensity-based segmentation: toy example

white
pixels

black pixels

gray
pixels

input image
intensity

1D feature vector: intensity measurement


These intensities define the three groups.

We could label every pixel in the image according to


which of these primary intensities it is.

i.e., segment the image based on the intensity feature.

Pixel count

25

Input image
Intensity

Pixel count

Input image

Intensity

26

13

190
Intensity

255

Goal: choose three centers


Label every pixel according to closest center
But how can we find the centers?

27

Clustering
chicken and egg problem:

Known centers: group points by allocating each to its closest


center.

Known group memberships: get centers by computing group


mean.

28

14

K-Means algorithm
Input: features
d: feature vector dimensionality
N: number of pixels
Output: centers & assignments

K-Means Clustering
Randomly initialize k cluster centers.
Iterate:
1.

Given cluster centers, determine points in each cluster


For each point i, find the closest center. Put i into cluster j

2.

Given points in each cluster, solve for centers.


Set center to be the mean of points in cluster j

3.

If ci have changed, go to 1

Guaranteed convergence to local minimum of


30

15

K-Means Clustering Results


K-means clustering based on intensity or color
Image

Intensity-based clusters

Color-based clusters

Clustering (r,g,b,x,y) values enforces spatial coherence

31

Limitations of k-means
Euclidean distance-based criterion

No justification for Euclidean metric in arbitrary feature space

Desired

K-means

Remedy: introduce more flexible models for


observations within each group

K-means allows only spherical clusters


Consider ellipsoidal
32

16

d-dimensional Gaussian distribution


Determined by mean and covariance matrix

Maximum likelihood parameter estimates:

P(x) = .1
P(x) = .2

P(x) = .5

33

Mixture of Gaussians

K Gaussian blobs with parameters:

Blob k is selected with probability


The likelihood of observing x is a weighted mixture of Gaussians
1D

Parameter Estimation: maximize


34

17

Expectation Maximization algorithm


E-step (Bayes Rule)

M-step

K-means vs. EM
k-means
Closest centers index
Isotropic Distance
(Euclidean)

EM
Soft assignment, R
Anisotropic Likelihood
(Covariance-based,`Mahalanobis)

Fast (e.g. kd-trees)


Accurate & more flexible
More robust to initalization Prone to local minima
Typical usage: initialize EM with k-means results

36

18

Problems of K-Means/EM
Number of clusters

Initialization/local minima
Mismatch with data distribution

This Lecture
Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

38

19

Finding Modes in a Histogram

How many modes are there?

Mode = local maximum of the density of a given distribution


Easy to see, hard to compute
39

Mean Shift Algorithm


Consider nonparametric density estimate

e. g.

Update: set each point to weighted average of its


neighborhood

Change is in direction of maximal increase in likelihood

20

Mean-Shift
Region of
interest
Center of
mass

Mean Shift
vector

Slide by Y. Ukrainitz & B. Sarel

Mean-Shift
Region of
interest
Center of
mass

Mean Shift
vector

Slide by Y. Ukrainitz & B. Sarel

21

Mean-Shift
Region of
interest
Center of
mass

Mean Shift
vector

Slide by Y. Ukrainitz & B. Sarel

Mean-Shift
Region of
interest
Center of
mass

Mean Shift
vector

Slide by Y. Ukrainitz & B. Sarel

22

Mean-Shift
Region of
interest
Center of
mass

Mean Shift
vector

Slide by Y. Ukrainitz & B. Sarel

Mean-Shift
Region of
interest
Center of
mass

Mean Shift
vector

Slide by Y. Ukrainitz & B. Sarel

23

Mean-Shift
Region of
interest
Center of
mass

Slide by Y. Ukrainitz & B. Sarel

Real Modality Analysis

Tessellate the space


with windows

Run the procedure in parallel

Slide by Y. Ukrainitz & B. Sarel

24

Real Modality Analysis

The blue data points were traversed by the windows towards the mode.
Slide by Y. Ukrainitz & B. Sarel

Mean-Shift Clustering
Cluster: all data points in the attraction basin of a mode
Attraction basin: the region for which all trajectories
lead to the same mode

50

25

Mean Shift for image segmentation

Mean-Shift Segmentation Results

52

26

More Results

53

Summary Mean-Shift
Pros

General, application-independent tool


Model-free, does not assume any prior shape (spherical,
elliptical, etc.) on data clusters
Just a single parameter (window size h)
h has a physical meaning (unlike k-means)

Finds variable number of modes


Robust to outliers

Cons

Output depends on window size


Window size (bandwidth) selection is not trivial
Computationally (relatively) expensive
Does not scale well with dimension of feature space
54

27

This Lecture
Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

55

Images as Graphs
q
wpq
p

Fully-connected graph

Node (vertex) for every pixel


Link between every pair of pixels, (p,q)
Affinity weight wpq for each link (edge)
wpq measures similarity
Similarity is inversely proportional to difference
(in color and position)

56

28

Segmentation by Graph Cuts


q
wpq

Break Graph into Segments

Delete links that cross between segments


Easiest to break links that have low similarity (low weight)
Similar pixels should be in the same segments
Dissimilar pixels should be in different segments
57

Measuring Affinity
Distance
Intensity
Color
(some suitable color space distance)

Texture
(vectors of filter outputs)
58

29

Graph Cut

Set of edges whose removal makes a graph disconnected


Cost of a cut

Sum of weights of cut edges:

A graph cut gives us a segmentation

What is a good graph cut and how do we find one?


59

Graph cut

30

Graph cut

Graph Cut

Here, the cut is nicely


defined by the block-diagonal
structure of the affinity matrix.
How can this be generalized?

62

31

Multi-way graph cut

Affinity matrix

Block detection

Minimum Cut
We can do segmentation by finding the minimum cut in
a graph (next lecture)
Drawback:

Weight of cut proportional to number of edges in the cut


Minimum cut tends to cut off very small, isolated components

Cuts with
lesser weight
than the
ideal cut
Ideal Cut
64

32

Normalized Cut (NCut)


A minimum cut penalizes large segments
This can be fixed by normalizing for size of segments
The normalized cut cost is:

assoc(A,V) = sum of weights of all edges in V that touch A

65

Optimization
Original problem: partition similarity graph
Mathematically equivalent to
Partitioning

-1

Relaxation: allow for continuous


Generalized Eigenvector problem

Embedding

Compute embedding and then cluster in new space

33

Interpretation as a Dynamical System

Treat the links as springs and shake the system

Elasticity decreasing function of affinity


Vibration modes correspond to segments

NCuts Example

Smallest eigenvectors

NCuts segments
68

34

Discretization
Problem: eigenvectors take on continuous values

How to choose the splitting point to binarize the image?

Image

Eigenvector

NCut scores

Possible procedures
a)
b)
c)

Pick a constant value (0, or 0.5).


Pick the median value as splitting point.
Look for the splitting point that has the minimum NCut value:
1. Choose n possible splitting points.
2. Compute NCut value.
3. Pick minimum.

CVPR 2006: Tolliver & Miller: Spectral Rounding

69

NCuts: Overall Procedure


1. Construct a weighted graph G=(V,E) from an image:
Connect each pair of pixels, and compute

4. Solve

for the smallest eigenvectors.

5. Form approximate solution to NCUT problem:

Threshold eigenvectors to get a discrete cut & recursively


subdivide if NCut value is below a pre-specified value.
Or: cluster eigenvector values using k-means
Or: Spectral Rounding

70

35

Color Image Segmentation with NCuts

71

Results with Color & Texture

72

36

Normalized Cuts results


Berkeley Segmentation Engine

Summary: Normalized Cuts


Pros:

Generic framework, flexible choice of affinity function


Does not require any model of the data distribution

Cons:

Time and memory complexity can be high


Dense, highly connected graphs many affinity computations
Solving eigenvalue problem

Preference for balanced partitions


If a region is uniform, NCuts will find the
modes of vibration of the image dimensions

74

37

Lecture Summary
Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

75

Lecture Summary
Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

76

38

Lecture Summary
Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

77

Lecture Summary
Introduction

Mid-Level Vision
Gestalt Theory & Grouping

Clustering & Segmentation

K-means & EM
Mean Shift
Normalized Cuts

78

39