Computer Vision - RWTH Aachen University

Download Report

Transcript Computer Vision - RWTH Aachen University

Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Computer Vision – Lecture 5
Structure Extraction & Segmentation
13.11.2012
Bastian Leibe
RWTH Aachen
http://www.vision.rwth-aachen.de
leibe@vision.rwth-aachen.de
Course Outline
• Image Processing Basics
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer

Structure Extraction
• Segmentation


Segmentation as Clustering
Graph-theoretic Segmentation
• Recognition
•
•
•
•

Global Representations

Subspace representations
Local Features & Matching
Object Categorization
3D Reconstruction
Motion and Tracking
2
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Recap: Canny Edge Detector
1. Filter image with derivative of Gaussian
2. Find magnitude and orientation of gradient
3. Non-maximum suppression:

Thin multi-pixel wide “ridges” down to single pixel width
4. Linking and thresholding (hysteresis):


Define two thresholds: low and high
Use the high threshold to start edge curves and the low
threshold to continue them
• MATLAB:
>> edge(image,‘canny’);
>> help edge
B. Leibe
3
adapted from D. Lowe, L. Fei-Fei
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Recap: Edges vs. Boundaries
Edges useful signal to
indicate occluding
boundaries, shape.
Here the raw edge
output is not so bad…
Slide credit: Kristen Grauman
…but quite often boundaries of interest
are fragmented, and we have extra
4
“clutter” edge points.
Recap: Chamfer Matching
• Chamfer Distance
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer


Average distance to nearest feature
This can be computed efficiently by correlating the edge
template with the distance-transformed image
Edge image
B. Leibe
Distance transform image 5
[D. Gavrila, DAGM’99]
Topics of This Lecture
• Fitting as parametric search
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer




Line detection
Hough transform
Extension to circles
Generalized Hough transform
• Segmentation as clustering


k-Means
Feature spaces
• Probabilistic clustering

Mixture of Gaussians, EM
• Model-free clustering

Mean-Shift clustering
B. Leibe
6
Example: Line Fitting
• Why fit lines?
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer

Many objects are characterized by presence of straight lines
• Wait, why aren’t we done just by running edge detection?
Slide credit: Kristen Grauman
B. Leibe
7
Difficulty of Line Fitting
• Extra edge points (clutter),
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
multiple models:

Which points go with
which line, if any?
• Only some parts of each
line detected, and some
parts are missing:

How to find a line that
bridges missing evidence?
• Noise in measured edge
points, orientations:

Slide credit: Kristen Grauman
B. Leibe
How to detect true underlying
parameters?
8
Voting
• It’s not feasible to check all combinations of features by
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
fitting a model to each possible subset.
• Voting is a general technique where we let the features
vote for all models that are compatible with it.

Cycle through features, cast votes for model parameters.

Look for model parameters that receive a lot of votes.
• Noise & clutter features will cast votes too, but typically
their votes should be inconsistent with the majority of
“good” features.
• Ok if some features not observed, as model can span
multiple fragments.
Slide credit: Kristen Grauman
B. Leibe
9
Fitting Lines
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
• Given points that belong to a line,
what is the line?
• Which points belong to which line(s)?
• How many lines are there?
• The Hough Transform is a voting
technique that can be used to answer
all of these
• Main idea:
1. Vote for all possible lines on which each
edge point could lie.
2. Look for lines that get many votes.
Slide credit: Kristen Grauman
B. Leibe
10
Finding Lines in an Image: Hough Space
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
y
b
b0
m0
m
Hough (parameter) space
x
Image space
• Connection between image (x,y) and Hough (m,b) spaces


A line in the image corresponds to a point in Hough space.
To go from image space to Hough space:
– Given a set of points (x,y), find all (m,b) such that y
Slide credit: Steve Seitz
B. Leibe
= mx + b
11
Finding Lines in an Image: Hough Space
y
b
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
y0
x0
x
Image space
m
Hough (parameter) space
• Connection between image (x,y) and Hough (m,b) spaces


A line in the image corresponds to a point in Hough space.
To go from image space to Hough space:
– Given a set of points (x,y), find all (m,b) such that y

= mx + b
What does a point (x0, y0) in the image space map to?
– Answer: the solutions of b = -x0m + y0
– This is a line in Hough space.
Slide credit: Steve Seitz
B. Leibe
12
Finding Lines in an Image: Hough Space
y
b
(x1, y1)
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
y0
(x0, y0)
b = –x1m + y1
x0
x
Image space
m
Hough (parameter) space
• What are the line parameters for the line that contains
both (x0, y0) and (x1, y1)?

It is the intersection of the lines
b = –x0m + y0 and
b = –x1m + y1
Slide credit: Steve Seitz
B. Leibe
13
Finding Lines in an Image: Hough Space
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
y
b
x
m
Hough (parameter) space
Image space
• How can we use this to find the most likely parameters
(m,b) for the most prominent line in the image space?


Let each edge point in image space vote for a set of possible
parameters in Hough space.
Accumulate votes in discrete set of bins; parameters with the
most votes indicate line in image space.
Slide credit: Steve Seitz
B. Leibe
14
Polar Representation for Lines
• Issues with usual (m,b) parameter space: can take on
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
infinite values, undefined for vertical lines.
[0,0]
y

x
d : perpendicular distance
from line to origin
d
 : angle the perpendicular
x cos  y sin   d
makes with the x-axis
• Point in image space
 Sinusoid segment in
Hough space
Slide adapted from Steve Seitz
15
Hough Transform Algorithm
H: accumulator array (votes)
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Using the polar parameterization:
x cos  y sin   d
Basic Hough transform algorithm
1. Initialize H[d,] = 0.
2. For each edge point (x,y) in the image
for  = 0 to 180 // some quantization
d  x cos  y sin 
d

H[d, ] += 1
3.
4.
Find the value(s) of (d,) where H[d,] is maximal.
The detected line in the image is given by d  x cos
 y sin 
• Time complexity (in terms of number of votes)?
Slide credit: Steve Seitz
B. Leibe
16
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: HT for Straight Lines
d
y

x
Image space
edge coordinates
Votes
Bright value = high vote count
Black = no votes
Slide credit: David Lowe
B. Leibe
17
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Real-World Examples
Slide credit: Kristen Grauman
B. Leibe
18
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Showing longest segments found
Slide credit: Kristen Grauman
B. Leibe
19
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Impact of Noise on Hough Transform
d
y

x
Image space
edge coordinates
Votes
What difficulty does this present for an implementation?
Slide credit: David Lowe
B. Leibe
20
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Impact of Noise on Hough Transform
Image space
edge coordinates
Votes
Here, everything appears to be “noise”, or random edge
points, but we still see peaks in the vote space.
Slide credit: David Lowe
B. Leibe
21
Extensions
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Extension 1: Use the image gradient
1.
same
2.
for each edge point I[x,y] in the image
 = gradient at (x,y)
d  x cos  y sin 
H[d,] += 1
3.
4.
same
same
(Reduces degrees of freedom)
Slide credit: Kristen Grauman
B. Leibe
22
Extensions
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Extension 1: Use the image gradient
1.
same
2.
for each edge point I[x,y] in the image
compute unique (d,) based on image gradient at (x,y)
H[d,] += 1
3.
4.
same
same
(Reduces degrees of freedom)
Extension 2

Give more votes for stronger edges (use magnitude of gradient)
Extension 3

Change the sampling of (d,) to give more/less resolution
Extension 4

The same procedure can be used with circles, squares, or any other
shape…
Slide credit: Kristen Grauman
B. Leibe
23
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi  a)2  ( yi  b)2  r 2
• For a fixed radius r, unknown gradient direction
b
Hough space
Image space
Slide credit: Kristen Grauman
B. Leibe
a
24
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi  a)2  ( yi  b)2  r 2
• For a fixed radius r, unknown gradient direction
Intersection:
most votes for
center occur
here.
Hough space
Image space
Slide credit: Kristen Grauman
B. Leibe
25
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi  a)2  ( yi  b)2  r 2
• For an unknown radius r, unknown gradient direction
r
b
a
Image space
Slide credit: Kristen Grauman
B. Leibe
Hough space
26
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi  a)2  ( yi  b)2  r 2
• For an unknown radius r, unknown gradient direction
r
b
a
Image space
Slide credit: Kristen Grauman
B. Leibe
Hough space
27
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Hough Transform for Circles
• Circle: center (a,b) and radius r
( xi  a)2  ( yi  b)2  r 2
• For an unknown radius r, known gradient direction
x
θ
Hough space
Image space
Slide credit: Kristen Grauman
B. Leibe
28
Hough Transform for Circles
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
For every edge pixel (x,y) :
For each possible radius value r:
For each possible gradient direction θ:
// or use estimated gradient
a = x – r cos(θ)
b = y + r sin(θ)
H[a,b,r] += 1
end
end
Slide credit: Kristen Grauman
B. Leibe
29
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Detecting Circles with Hough
Crosshair indicates results of Hough transform,
bounding box found via motion differencing.
Slide credit: Kristen Grauman
B. Leibe
30
Example: Detecting Circles with Hough
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Original
Edges
Votes: Penny
Note: a different Hough transform (with separate accumulators) is used for each circle radius (quarters vs. penny).
Slide credit: Kristen Grauman
B. Leibe
31
Coin finding sample images from: Vivek Kwatra
Example: Detecting Circles with Hough
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Combined
Original
detections
Edges
Votes: Quarter
Note: a different Hough transform (with separate accumulators) is used for each circle radius (quarters vs. penny).
Slide credit: Kristen Grauman
B. Leibe
32
Coin finding sample images from: Vivek Kwatra
Voting: Practical Tips
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
• Minimize irrelevant tokens first (take edge points with
significant gradient magnitude)
• Choose a good grid / discretization


Too coarse: large votes obtained when too many different lines
correspond to a single bucket
Too fine: miss lines because some points that are not exactly
collinear cast votes for different buckets
• Vote for neighbors, also (smoothing in accumulator
array)
• Utilize direction of edge to reduce free parameters by 1
• To read back which points voted for “winning” peaks,
keep tags on the votes.
Slide credit: Kristen Grauman
B. Leibe
33
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Hough Transform: Pros and Cons
Pros
• All points are processed independently, so can cope with
occlusion
• Some robustness to noise: noise points unlikely to
contribute consistently to any single bin
• Can detect multiple instances of a model in a single pass
Cons
• Complexity of search time increases exponentially with
the number of model parameters
• Non-target shapes can produce spurious peaks in
parameter space
• Quantization: hard to pick a good grid size
Slide credit: Kristen Grauman
B. Leibe
34
Generalized Hough Transform
• What if we want to detect arbitrary shapes defined by
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
boundary points and a reference point?
x
At each boundary point,
compute displacement
vector: r = a – pi.
a
θ
p1
θ
p2
Image space
For a given model shape:
store these vectors in a
table indexed by gradient
orientation θ.
[Dana H. Ballard, Generalizing the Hough Transform to Detect Arbitrary Shapes, 1980]
Slide credit: Kristen Grauman
B. Leibe
35
Generalized Hough Transform
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
To detect the model shape in a new image:
• For each edge point

Index into table with its gradient orientation θ

Use retrieved r vectors to vote for position of reference point
• Peak in this Hough space is reference point with most
supporting edges
Assuming translation is the only transformation here,
i.e., orientation and scale are fixed.
Slide credit: Kristen Grauman
B. Leibe
36
Example: Generalized Hough Transform
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Say we’ve already
stored a table of
displacement vectors
as a function of edge
orientation for this
model shape.
Slide credit: Svetlana Lazebnik
Model shape
37
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Generalized Hough Transform
Now we want to look
at some edge points
detected in a new
image, and vote on
the position of that
shape.
Displacement vectors for model points
Slide credit: Svetlana Lazebnik
38
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Generalized Hough Transform
Range of voting locations for test point
Slide credit: Svetlana Lazebnik
39
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Generalized Hough Transform
Range of voting locations for test point
Slide credit: Svetlana Lazebnik
40
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Generalized Hough Transform
Slide credit: Svetlana Lazebnik
Votes for points with θ =
41
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Generalized Hough Transform
Displacement vectors for model points
Slide credit: Svetlana Lazebnik
42
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Generalized Hough Transform
Range of voting locations for test point
Slide credit: Svetlana Lazebnik
43
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Example: Generalized Hough Transform
Slide credit: Svetlana Lazebnik
Votes for points with θ =
44
Application in Recognition
• Instead of indexing displacements by gradient
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
orientation, index by “visual codeword”.
Visual codeword with
displacement vectors
Training image
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved
Categorization and Segmentation, International Journal of Computer Vision, Vol. 77(13), 2008.
B. Leibe
45
Application in Recognition
• Instead of indexing displacements by gradient
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
orientation, index by “visual codeword”.
Test image
• We’ll hear more about this method in lecture 14…
B. Leibe
46
Topics of This Lecture
• Fitting as parametric search
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer




Line detection
Hough transform
Extension to circles
Generalized Hough transform
• Segmentation as clustering


k-Means
Feature spaces
• Probabilistic clustering

Mixture of Gaussians, EM
• Model-free clustering

Mean-Shift clustering
B. Leibe
47
Image Segmentation
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
• Goal: identify groups of pixels that go together
Slide credit: Steve Seitz, Kristen Grauman
B. Leibe
48
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Image Segmentation: Toy Example
white
pixels
3
1
black pixels
2
gray
pixels
input image
intensity
• These intensities define the three groups.
• We could label every pixel in the image according to
which of these primary intensities it is.

i.e., segment the image based on the intensity feature.
• What if the image isn’t quite so simple?
Slide credit: Kristen Grauman
B. Leibe
49
Pixel count
Pixel count
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Input image
Intensity
Input image
Slide credit: Kristen Grauman
B. Leibe
Intensity
50
Pixel count
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Input image
Intensity
• Now how to determine the three main intensities that
define our groups?
• We need to cluster.
Slide credit: Kristen Grauman
B. Leibe
51
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
0
190
Intensity
255
3
1
2
• Goal: choose three “centers” as the representative
intensities, and label every pixel according to which of
these centers it is nearest to.
• Best cluster centers are those that minimize SSD
between all points and their nearest cluster center ci:
Slide credit: Kristen Grauman
B. Leibe
52
Clustering
• With this objective, it is a “chicken and egg” problem:
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer


If we knew the cluster centers, we could allocate points to
groups by assigning each to its closest center.
If we knew the group memberships, we could get the centers by
computing the mean per group.
Slide credit: Kristen Grauman
B. Leibe
53
K-Means Clustering
• Basic idea: randomly initialize the k cluster centers, and
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
iterate between the two steps we just saw.
1.
2.
Randomly initialize the cluster centers, c1, ..., cK
Given cluster centers, determine points in each cluster
–
3.
Given points in each cluster, solve for ci
–
4.
For each point p, find the closest ci. Put p into cluster i
Set ci to be the mean of points in cluster i
If ci have changed, repeat Step 2
• Properties


Will always converge to some solution
Can be a “local minimum”
–
Does not always find the global minimum of objective function:
Slide credit: Steve Seitz
B. Leibe
54
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Segmentation as Clustering
K=2
K=3
img_as_col = double(im(:));
cluster_membs = kmeans(img_as_col, K);
labelim = zeros(size(im));
for i=1:k
inds = find(cluster_membs==i);
meanval = mean(img_as_column(inds));
labelim(inds) = meanval;
end
Slide credit: Kristen Grauman
B. Leibe
55
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
K-Means Clustering
• Java demo:
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
B. Leibe
56
K-Means++
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
• Can we prevent arbitrarily bad local minima?
1. Randomly choose first center.
2. Pick new center with prob. proportional to

(Contribution of p to total error)
3. Repeat until k centers.
• Expected error = O(log k) * optimal
Arthur & Vassilvitskii 2007
Slide credit: Steve Seitz
B. Leibe
57
Feature Space
• Depending on what we choose as the feature space, we
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
can group pixels in different ways.
• Grouping pixels based on
intensity similarity
• Feature space: intensity value (1D)
Slide credit: Kristen Grauman
B. Leibe
58
Feature Space
• Depending on what we choose as the feature space, we
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
can group pixels in different ways.
R=255
G=200
B=250
• Grouping pixels based
on color similarity
R=245
G=220
B=248
B
G
R=15
G=189
B=2
R
• Feature space: color value (3D)
Slide credit: Kristen Grauman
B. Leibe
R=3
G=12
B=2
59
Segmentation as Clustering
• Depending on what we choose as the feature space, we
• Grouping pixels based
on texture similarity
F1
F2
Filter bank
of 24 filters
…
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
can group pixels in different ways.
F24
• Feature space: filter bank responses (e.g., 24D)
Slide credit: Kristen Grauman
B. Leibe
60
Smoothing Out Cluster Assignments
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
• Assigning a cluster label per pixel may yield outliers:
Labeled by cluster center’s
intensity
Original
?
• How can we ensure they
3
are spatially smooth?
1
Slide credit: Kristen Grauman
B. Leibe
2
61
Segmentation as Clustering
• Depending on what we choose as the feature space, we
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
can group pixels in different ways.
• Grouping pixels based on
intensity+position similarity
Intensity
Y
X
 Way to encode both similarity and proximity.
Slide credit: Kristen Grauman
B. Leibe
62
K-Means Clustering Results
• K-means clustering based on intensity or color is
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
essentially vector quantization of the image attributes

Clusters don’t have to be spatially coherent
Image
Slide credit: Svetlana Lazebnik
Intensity-based clusters
B. Leibe
Color-based clusters
63
Image source: Forsyth & Ponce
K-Means Clustering Results
• K-means clustering based on intensity or color is
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
essentially vector quantization of the image attributes

Clusters don’t have to be spatially coherent
• Clustering based on (r,g,b,x,y) values enforces more
spatial coherence
Slide credit: Svetlana Lazebnik
B. Leibe
64
Image source: Forsyth & Ponce
Summary K-Means
• Pros
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer


Simple, fast to compute
Converges to local minimum
of within-cluster squared error
• Cons/issues





Setting k?
Sensitive to initial centers
Sensitive to outliers
Detects spherical clusters only
Assuming means can be
computed
Slide credit: Kristen Grauman
B. Leibe
65
Topics of This Lecture
• Fitting as parametric search
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer




Line detection
Hough transform
Extension to circles
Generalized Hough transform
• Segmentation as clustering


k-Means
Feature spaces
• Probabilistic clustering

Mixture of Gaussians, EM
• Model-free clustering

Mean-Shift clustering
B. Leibe
66
Probabilistic Clustering
• Basic questions
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer


What’s the probability that a point x is in cluster m?
What’s the shape of each cluster?
• K-means doesn’t answer these questions.
• Basic idea

Instead of treating the data as a bunch of points, assume that
they are all generated by sampling a continuous function.
This function is called a generative model.

Defined by a vector of parameters θ

Slide credit: Steve Seitz
B. Leibe
67
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Mixture of Gaussians
• One generative model is a mixture of Gaussians (MoG)

K Gaussian blobs with means μb covariance matrices Vb, dimension d
– Blob b defined by:


Blob b is selected with probability
The likelihood of observing x is a weighted mixture of Gaussians
,
Slide credit: Steve Seitz
B. Leibe
68
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Expectation Maximization (EM)
•
Goal

•
Find blob parameters θ that maximize the likelihood function:
Approach:
1.
2.
3.
E-step: given current guess of blobs, compute ownership of each point
M-step: given ownership probabilities, update blobs to maximize
likelihood function
Repeat until convergence
Slide credit: Steve Seitz
B. Leibe
69
EM Details
• E-step
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer

Compute probability that point x is in blob b, given current
guess of θ
• M-step

Compute probability that blob b is selected
(N data points)

Mean of blob b

Covariance of blob b
70
Slide credit: Steve Seitz
B. Leibe
Applications of EM
• Turns out this is useful for all sorts of problems
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer





Any clustering problem
Any model estimation problem
Missing data problems
Finding outliers
Segmentation problems
– Segmentation based on color
– Segmentation based on motion
– Foreground/background separation

...
• EM demo

http://lcn.epfl.ch/tutorial/english/gaussian/html/index.html
Slide credit: Steve Seitz
B. Leibe
71
Segmentation with EM
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Original image
EM segmentation results
k=2
k=3
k=4
B. Leibe
k=5
72
Image source: Serge Belongie
Summary: Mixtures of Gaussians, EM
• Pros
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer




Probabilistic interpretation
Soft assignments between data points and clusters
Generative model, can predict novel data points
Relatively compact storage
• Cons

Local minima
– k-means is NP-hard even with k=2

Initialization
– Often a good idea to start with some k-means iterations.

Need to know number of components
– Solutions: model selection (AIC, BIC), Dirichlet process mixture


Need to choose generative model
Numerical problems are often a nuisance
B. Leibe
73
Topics of This Lecture
• Fitting as parametric search
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer




Line detection
Hough transform
Extension to circles
Generalized Hough transform
• Segmentation as clustering


k-Means
Feature spaces
• Probabilistic clustering

Mixture of Gaussians, EM
• Model-free clustering

Mean-Shift clustering
B. Leibe
74
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Finding Modes in a Histogram
• How many modes are there?


Mode = local maximum of the density of a given distribution
Easy to see, hard to compute
Slide credit: Steve Seitz
B. Leibe
75
Mean-Shift Segmentation
• An advanced and versatile technique for clusteringAugmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
based segmentation
http://www.caip.rutgers.edu/~comanici/MSPAMI/msPamiResults.html
D. Comaniciu and P. Meer, Mean Shift: A Robust Approach toward Feature Space Analysis,
PAMI 2002.
Slide credit: Svetlana Lazebnik
B. Leibe
76
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Mean-Shift Algorithm
• Iterative Mode Search
1.
2.
3.
4.
Initialize random seed, and window W
Calculate center of gravity (the “mean”) of W:
Shift the search window to the mean
Repeat Step 2 until convergence
Slide credit: Steve Seitz
B. Leibe
77
Mean-Shift
Region of
interest
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Center of
mass
Mean Shift
vector
Slide by Y. Ukrainitz & B. Sarel
Mean-Shift
Region of
interest
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Center of
mass
Mean Shift
vector
Slide by Y. Ukrainitz & B. Sarel
Mean-Shift
Region of
interest
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Center of
mass
Mean Shift
vector
Slide by Y. Ukrainitz & B. Sarel
Mean-Shift
Region of
interest
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Center of
mass
Mean Shift
vector
Slide by Y. Ukrainitz & B. Sarel
Mean-Shift
Region of
interest
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Center of
mass
Mean Shift
vector
Slide by Y. Ukrainitz & B. Sarel
Mean-Shift
Region of
interest
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Center of
mass
Mean Shift
vector
Slide by Y. Ukrainitz & B. Sarel
Mean-Shift
Region of
interest
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Center of
mass
Slide by Y. Ukrainitz & B. Sarel
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Real Modality Analysis
Tessellate the space
with windows
Slide by Y. Ukrainitz & B. Sarel
Run the procedure in parallel
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Real Modality Analysis
The blue data points were traversed by the windows towards the mode.
Slide by Y. Ukrainitz & B. Sarel
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Mean-Shift Clustering
• Cluster: all data points in the attraction basin of a mode
• Attraction basin: the region for which all trajectories
lead to the same mode
Slide by Y. Ukrainitz & B. Sarel
B. Leibe
87
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Mean-Shift Clustering/Segmentation
•
•
•
•
Find features (color, gradients, texture, etc)
Initialize windows at individual pixel locations
Perform mean shift for each window until convergence
Merge windows that end up near the same “peak” or
mode
Slide credit: Svetlana Lazebnik
B. Leibe
88
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Mean-Shift Segmentation Results
http://www.caip.rutgers.edu/~comanici/MSPAMI/msPamiResults.html
Slide credit: Svetlana Lazebnik
B. Leibe
89
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
More Results
Slide credit: Svetlana Lazebnik
B. Leibe
90
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
More Results
Slide credit: Svetlana Lazebnik
B. Leibe
91
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Problem: Computational Complexity
• Need to shift many windows…
• Many computations will be redundant.
B. Leibe
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Speedups: Basin of Attraction
r
1. Assign all points within radius r of end point to the mode.
B. Leibe
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
Speedups
r =c
2. Assign all points within radius r/c of the search path to the mode.
B. Leibe
Summary Mean-Shift
• Pros
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer



General, application-independent tool
Model-free, does not assume any prior shape (spherical,
elliptical, etc.) on data clusters
Just a single parameter (window size h)
– h has a physical meaning (unlike k-means)


Finds variable number of modes
Robust to outliers
• Cons




Output depends on window size
Window size (bandwidth) selection is not trivial
Computationally (relatively) expensive (~2s/image)
Does not scale well with dimension of feature space
Slide credit: Svetlana Lazebnik
B. Leibe
95
Segmentation: Caveats
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
• We’ve looked at bottom-up ways to segment an image
into regions, yet finding meaningful segments is
intertwined with the recognition problem.
• Often want to avoid making hard decisions too soon
• Difficult to evaluate; when is a segmentation successful?
Slide credit: Kristen Grauman
B. Leibe
96
Generic Clustering
• We have focused on ways to group pixels into image
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
segments based on their appearance

Find groups; “quantize” feature space
• In general, we can use clustering techniques to find
groups of similar “tokens”, provided we know how to
compare the tokens.


E.g., segment an image into the types of motions present
E.g., segment a video into the types of scenes (shots) present
B. Leibe
97
References and Further Reading
• Background information on segmentation by clustering
Augmented Computing
and Sensory
PerceptualVision
WS 12/13
Computer
can be found in Chapter 14 of

D. Forsyth, J. Ponce,
Computer Vision – A Modern Approach.
Prentice Hall, 2003
• More on the EM algorithm can be
found in Chapter 16.1.2.
• Try the k-means and EM demos at


http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
http://lcn.epfl.ch/tutorial/english/gaussian/html/index.html