6 views

Uploaded by Alessio

save

- Natron
- Vision Based Plant Leaf Disease Detection on The Color Segmentation through Fire Bird V Robot
- P5
- 06262482
- 10.1.1.58.3283
- DiameterJ - ImageJ
- Slides Aula18
- Digital Image Processing IEEE projects 2016-2017
- Fast ICA processed
- Paper for Histrogram Feature and GLCM Feature
- LIOCR
- Ieee Paper 2
- oriya character recognition
- Ieee Sb Seminar
- Percept Ron
- Perception
- Application of Normalized Cross Correlation to Image Registration
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer Vision
- Detection of Knee Osteoarthritis by Measuring the Joint Space Width in Knee X-ray Images
- 10.1.1.332.1208.pdf
- IJETAE_0514_36.pdf
- luis 18
- bordes redondeados
- 2010-EG-Curves
- 2003-Insight-a
- Content-Based Image Retrieval Using Sketch
- A Study of Color Histogram Based Image Retrieval
- Krueger
- photo editing work order2017
- PART6
- 09.18.Nonlinear Total Variation Based Noise Removal Algorithms
- 09.21.The mathematics of moving contact lines in thin liquid ﬁlms
- 08.21.What Went Where
- 09.14.Scale-Space and Edge Detection Using Ani So Tropic Diffusion
- 09.22.LCIS_A Boundary Hierarchy for Detail-preserving Contrast Reduction
- 09.17.Explicit Algorithms for a New Time Dependent Model Based on Level Set Motion for Nonlinear Debluring and Noise Removal
- 08.20.Layered Representation for Motion Analysis
- 09.13.A new image processing primitive_reconstructing images from modiﬁed ﬂow ﬁelds
- 09.15.Image Selective Smoothing and Edge Detection by Nonlinear Diffusion
- 09.23.Mathematical Models for Local Deter Minis Tic In Paintings
- 09.16.Fronts Propagating With Curvature Dependent Speed_algorithms Based on Hamilton-Jacobi Formulations
- 08.17.Layered Depth Images
- 09.04.Detection of Missing Data in Image Sequences
- 09.20.a Variational Level-set Approach to Multi Phase Motion
- 09.05.Interpolation of Missing Data in Image Sequences
- 06.09.Camera Stabilization Based on 2.5D Motion Model Estimation and Inertial Motion Filtering
- 09.07.Combining Frequency and Spatial Domain Information for Fast Interactive Image Noise Removal
- 01.20.Texture Synthesis by Non-parametric Sampling
- 06.11.Fast Software Image Stabilization With Color Registration
- 09.10.Texture characterization via joint statistics of wavelet coefﬁcient magnitudes
- 09.24.Filling-in by joint interpolation of vector ﬁelds and grey levels
- 08.14.Jetstream Probabilistic Contour Extraction With Particles
- 09.08.Texture Synthesis by Non-parametric Sampling
- 06.10.Digital Image Stabilizing Algorithms Based on Bit-plane Matching
- 06.02.Real-Time Scene Stabilization and Mosaic Construction
- 09.09.Pyramid Based Texture Analysis_synthesis
- 06.07.Fast 3D Stabilization and Mosaic Construction
- 06.12.Image Stabilization by Features Tracking
- Image Motion Deblurring
- 09.12.Level-Lines Based Dis Occlusion

You are on page 1of 8

**A Layered Approach to Stereo Reconstruction
**

Simon Baker ∗ Richard Szeliski and P. Anandan

Department of Computer Science Microsoft Research

Columbia University Microsoft Corporation

New York, NY 10027 Redmond, WA 98052

**Abstract the images into sub-images, commonly referred to as lay-
**

We propose a framework for extracting structure from ers, such that the pixels within each layer move in a manner

stereo which represents the scene as a collection of approx- consistent with a parametric transformation. The motion of

imately planar layers. Each layer consists of an explicit each layer is determined by the values of the parameters.

3D plane equation, a colored image with per-pixel opac- An important transformation is the 8–parameter homog-

ity (a sprite), and a per-pixel depth offset relative to the raphy (collineation), because it describes the motion of a

plane. Initial estimates of the layers are recovered using rigid planar patch as either it or the camera moves [11].

techniques taken from parametric motion estimation. These While existing techniques have been successful in de-

initial estimates are then refined using a re-synthesis algo- tecting multiple independent motions, layer extraction for

rithm which takes into account both occlusions and mixed scene modeling has not been fully developed. One fact

pixels. Reasoning about such effects allows the recovery which has not been exploited is that, when simultaneously

of depth and color information with high accuracy, even imaged by several cameras, each of the layers implicitly

in partially occluded regions. Another important benefit of lies on a fixed plane in the 3D world. Another omission

our framework is that the output consists of a collection of is the proper treatment of transparency. With a few excep-

approximately planar regions, a representation which is far tions (e.g. [27, 3, 18]), the decomposition of an image into

more appropriate than a dense depth map for many appli- layers that are partially transparent has not been attempted.

cations such as rendering and video parsing. In contrast, scene modeling using multiple partially trans-

parent layers is common in the graphics community [22, 6].

1 Introduction In this paper, we present a framework for reconstructing

Although extracting scene structure using stereo has a scene as a collection of approximately planar layers. Each

long been an active area of research, the recovery of ac- of the layers has an explicit 3D plane equation and is recov-

curate depth information still remains largely unsolved. ered as a sprite, i.e. a colored image with per-pixel opacity

Most existing algorithms work well when matching fea- (transparency) [22, 6, 32, 20]. To model a wider range of

ture points or highly textured regions, but perform poorly scenes, a per-pixel depth offset relative to the plane is also

around occlusion boundaries and in untextured regions. added. Recovery of the layers begins with the iteration of

A common element of many recent attempts to solve several steps based on techniques developed for paramet-

these problems is explicit modeling of the 3D volume of ric motion estimation, image registration, and mosaicing.

the scene [37, 13, 8, 25, 26, 30]. The scene volume is The resulting layer estimates are then refined using a re-

discretized, often in terms of equal increments of disparity, synthesis step which takes into account both occlusions and

rather than into equally sized voxels. The goal is then to mixed pixels in a similar manner to [30].

find the voxels which lie on the surfaces of the objects in Our layered approach to stereo shares many of the ad-

the scene. The major benefits of such approaches include, vantages of the aforementioned volumetric techniques. In

the equal and efficient treatment of a large number of im- addition, it offers a number of other advantages:

ages [8], the possibility of modeling occlusions [13], and • The combination of the global model (the plane) with

the detection of mixed pixels at occlusion boundaries to the local correction to it (the per-pixel depth offset)

obtain sub-pixel accuracy [30]. Unfortunately, discretizing results in very robust performance. In this respect, the

space volumetrically introduces a huge number of degrees framework is similar to the plane + parallax work of

of freedom, and leads to sampling and aliasing artifacts. [19, 23, 29], the model-based stereo work of [10], and

Another active area of research is the detection of para- the parametric motion + residual optical flow of [12].

metric motions within image sequences [1, 34, 12, 9, 15, • The output (a collection of approximately planar re-

14, 24, 5, 36, 35]. Here, the goal is the decomposition of gions) is more suitable than a discrete collection of

∗ The research described in this paper was conducted while the first voxels for many applications, including, rendering

author was a summer intern at Microsoft Research. [10] and video parsing [15, 24].

1

y Layer sprite L 2 Input: Images I k and cameras Pk

Plane n 2T x = 0

Layer sprite L 1 Residual depth Z 2

Plane n 1T x = 0 Initialize Layers

Residual depth Z 1 (Section 2.1)

x

z

Estimate Plane Equations n l

v (Section 2.2)

Image I k

u Estimate Layer Sprites L l

(Section 2.3)

Camera Matrix Pk Boolean

Iteration

Opacities

Estimate Residual Depth Z l

(Section 2.4)

v v

**Assign Pixels to Layers B kl
**

u u (Section 2.5)

Boolean Mask B k 1 Boolean Mask Bk 2

v v

Real Valued Refine Layer Sprites L l

Opacities (Section 3)

u u

Masked Image M k 1 Masked Image M k 2

Output: n l , L l , and Z l

Figure 1: Suppose K images Ik are captured by K cameras Pk .

We assume that the scene can be represented by L sprite images Figure 2: We wish to compute the layer sprites Ll , the layer

Ll on planes nTl x = 0 with depth offsets Zl . The boolean masks plane vectors nl , the residual parallax Zl , and the boolean mask

Bkl denote the pixels in image Ik from layer Ll and the masked images Bkl . After initializing, we iteratively compute each quan-

images are given by Mkl = Bkl · Ik . tity in turn fixing the others. Finally, we refine the layer sprite

estimates using a re-synthesis algorithm.

1.1 Basic Concepts and Notation

We use homogeneous coordinates for both 3D world co- scribed in Section 2, we assume boolean opacities to get

ordinates x = (x, y, z, 1)T and for 2D image coordinates a first approximation to the structure of the scene. If the

u = (u, v, 1)T . The basic concepts of our framework are opacities are boolean, each point in each image Ik is only

illustrated in Figure 1. We assume that the input consists the image of a point on one of the layers Ll . We therefore

of K images I1 (u1 ), I2 (u2 ), . . . , IK (uK ) captured by K introduce boolean masks Bkl which denote the pixels in

cameras with known projection matrices P1 , P2 , . . . , PK . image Ik that are images of points on layer Ll . So, in addi-

(In what follows, we drop the image coordinates uk unless tion to Ll , nl , and Zl , we also need to estimate the masks

they are needed to explain a warping operation explicitly.) Bkl . Once we have estimates of the masks, we immedi-

We wish to reconstruct the world as a collection of L ap- ately compute masked input images Mkl = Bkl · Ik (see

proximately planar layers. Following [6], we denote a layer Figure 1). In the second part of our framework, we use the

sprite with pre-multiplied opacities by: initial estimates of the layers made by the first part as input

into a re-synthesis algorithm which refines the layer sprites

Ll (ul ) = (αl · rl , αl · gl , αl · bl , αl ) (1)

Ll , including the opacities αl . This second step requires a

where rl = rl (ul ) is the red band, gl = gl (ul ) is the green generative or forward model of the image formation pro-

band, bl = bl (ul ) is the blue band, and αl = αl (ul ) is the cess and is discussed in Section 3.

opacity. We also associate a homogeneous vector nl with In Figure 2 we illustrate the processing steps of the

each layer (which defines the plane equation of the layer framework. Given any three of Ll , nl , Zl , and Bkl , there

via nTl x = 0) and a per-pixel residual depth offset Zl (ul ). are techniques for estimating the remaining one. The first

Our goal is to estimate the layer sprites Ll , the plane part of our framework therefore consists of first initializing

vectors nl , and the residual depths Zl . To do so, we wish these four quantities, and then iteratively estimating each

to use techniques for parametric motion estimation. Un- one while fixing the other three. After good initial esti-

fortunately, most such techniques assume boolean-valued mates of the layers are obtained, we move on to the second

opacities αl (i.e., unique layer assignments). We therefore part of the framework in which we use real valued opacities

split our framework into two parts. In the first part, de- and refine the entire layer sprites, including the opacities.

2

2 Initial Computation of the Layers points on the plane nTl x = 0. Using this relation, we can

2.1 Initialization of the Layers warp all of the masked images onto the coordinate frame of

Initialization of the layers is a difficult task, which is one distinguished image1 (w.l.o.g. image M1l ) as follows:

! l " ! "

inevitably somewhat ad-hoc. A number of approaches have H1k ◦ Mkl (u1 ) ≡ Mkl Hl1k u1 . (7)

been proposed in the parametric motion literature:

Here, Hl1k ◦ Mkl is the masked image Mkl warped into the

• Randomly initialize a large number of small layers, coordinate frame of M1l .

which grow and merge until a small number of layers The property which we use to compute nl is that, assum-

remain which accurately model the scene [34, 24, 5]. ing the pixel assignments to the layers Bkl are correct, the

• Iteratively apply dominant motion extraction [15, 24], world is piecewise planar, and the surfaces are Lambertian,

at each step applying the algorithm to the residual re- the warped images Hl1k ◦ Mkl should agree with each other

gions of the previous step. where they overlap. There are a number of functions which

• Perform a color segmentation in each image, match can be used to measure the degree of consistency between

the segments, and use as the initial assignment [2]. the warped images, including least squares [4] and robust

• Apply a simple stereo algorithm to get an approximate measures [9, 24]. In both cases, the goal is the same: find

depth map, and then fit planes to the depth map. the plane equation vector nl which maximizes the degree

• Get a human to initialize the layers. (In many applica- of consistency. Typically, this extremum is found using

tions, such as model acquisition [10] and video pars- some form of gradient decent, such as the Gauss-Newton

ing [24], the goal is a semi-automatic algorithm and method, and the optimization is performed in a hierarchi-

limited user input is acceptable.) cal (i.e. pyramid based) fashion to avoid local extrema [4].

In this paper, we assume a human has initialized the layers. To apply this standard approach [31], we simply need to de-

As discussed in Section 5, fully automating the framework rive the Jacobian of the image warp Hl1k with respect to the

is left as future work. parameters of nl . This is straightforward from Equation (6)

2.2 Estimation of the Plane Equations because we know the cameras matrices Pk .

To compute the plane equation vector nl we need to map 2.3 Estimation of the Layer Sprites

the pixels in the masked images Mkl onto the plane defined Before we can compute the layer sprites Ll , we need to

by nTl x = 0. If x is a 3D world coordinate of a point and choose 2D coordinate systems for the planes. Such coor-

uk is the image of x in camera Pk , we have: dinate systems can be specified by a collection of arbitrary

uk = Pk x (2) (rank 3) camera matrices Ql .2 Then, similarly to Equa-

tions (5) and (6), we can show that the image coordinates

where equality is in the 2D projective space P 2 [11]. Since uk of the pixel in image Mkl which is projected onto the

Pk is of rank 3, it follows that: pixel ul on the plane nTl x = 0 is given by:

! "

x = P∗k uk + spk (3) uk = Pk (nTl ql )I − ql nTl Q∗l ul ≡ Hlk uk (8)

where P∗k = PTk (Pk PTk )−1 is the pseudo-inverse of the where Q∗l is the pseudo-inverse of Ql , and ql is a vector

camera matrix Pk , s is an unknown scalar, and pk is a in the null space of Ql . The homography Hlk can be used

vector in the null space of Pk , (i.e. Pk pk = 0). If x lies to warp the image Mkl forward onto the coordinate frame

on the plane nTl x = 0 we have: of the plane nTl x = 0, the result of which is denoted Hlk ◦

Mkl . Then, we can estimate the layer sprite (with boolean

nTl P∗k uk + snTl pk = 0. (4) opacities) by blending the warped images:

Solving this equation for s, substituting into Equation (3), K

#

and rearranging yields: Ll = Hlk ◦ Mkl (9)

! " k=1

x = (nTl pk )I − pk nTl P∗k uk . (5)

1 It is possible to add an extra 2D perspective coordinate transforma-

The importance of Equation (5) is that it allows us to map tion here. Suppose H is an arbitrary homography. We could warp each

a pixel coordinate uk in image Mkl onto the point on the masked image onto H ◦ Hl1k ◦ Mkl (u1 ) ≡ Mkl (HHl1k u1 ). The addi-

plane nTl x = 0, of which it is the image. So, we can now tion of the homography H can be used to remove the dependence on one

distinguished image, as advocated by Collins [8].

map this point onto its image in another camera Pk! : 2 A suitable choice for Q would be one of the camera matrices P ,

l k

! " in which case Equation (8) reduces to Equation (6). Another interesting

uk! = Pk! (nTl pk )I − pk nTl P∗k uk ≡ Hlkk! uk (6) choice is one in which the null space of Ql is perpendicular to the plane

defined by nl , and the pseudo-inverse maps the coordinate axes onto per-

where Hlkk! is a homography (collineation of P 2 [11]). pendicular vectors in the plane (i.e. a camera with a frontal image plane).

Equation (6) describes the mapping between the two im- Note that often we do not want a fronto-parallel camera, since it may un-

ages which would hold if the pixels were all images of necessarily warp the input images.

3

$ K K

where is the blending operator. There are a number of # #

ways in which blending could be performed. One simple Ll = (Hlk , tkl , Zl ) ◦ Mkl = Wkl ◦ Mkl (11)

k=1 k=1

method would be to take the mean of the color values. A

refinement would be to use a feathering algorithm such as rather than Equation (9).

[28], where the average is weighted by the distance of each 2.5 Pixel Assignment to the Layers

pixel from the nearest invisible pixel (i.e. α = 0) in Mkl . The basis for the computation of the pixel assignments

Alternatively, robust techniques could be used to estimate is a comparison of the warped images Wkl ◦ Mkl with the

Ll . The simplest such example is the median operator, but layer sprites Ll .3 If the pixel assignment was correct (and

more sophisticated alternatives exist. neglecting resampling issues) these images should be iden-

An unfortunate effect of the blending in Equation (9) tical where they overlap. Unfortunately, comparing these

is that averaging tends to increase image blur. Part of the images does not yield any information outside the current

cause is non-planarity in the scene (which is modeled in estimates of the masked regions.

Section 2.4), but image noise and resampling error also To allow the pixel assignments to grow, we take the old

contribute. One simple method of compensating for this estimates of Bkl and enlarge them by a few pixels to yield

effect is to deghost the sprites [28]. Another solution is new estimates B̃kl . These new assignments can be com-

to use image enhancement techniques such as [16, 21, 7], puted by iterating simple morphological operations, such

which can even be used to obtain super-resolution sprites. as setting B̃kl = 1 for the neighbors of every pixel for

2.4 Estimation of the Residual Depth which Bkl = 1. Enlarged masked images are then com-

In general, the scene will not be piecewise planar. To puted using:

model any non-planarity, we allow the point ul on the plane M̃kl = B̃kl · Ik (12)

nTl x = 0 to be displaced slightly. We assume it is displaced and a new estimate of the layer sprite computed using:

in the direction of the ray through ul defined by the cam-

K

#

era matrix Ql . The distance it is displaced is denoted by

L̃l = Wkl ◦ M̃kl . (13)

Zl (ul ), as measured in the direction normal to the plane.

k=1

In this case, the homographic warps used in the previous

section are not applicable, but using a similar argument, it (Here, Zl is enlarged in Wkl so that it declines to zero

is possible to show (see also [19, 23]) that: smoothly outside the old masked region.) One small danger

uk = Hlk ul + w(ul )Zl (ul )tkl (10) of working with M̃kl and L̃l is that occluded pixels may be

! T " ∗ blended together with unoccluded pixels and result in poor

where Hk = Pk (nl ql )I − ql nl Ql is the planar ho-

l T

estimates of the L̃l . A partial solution to this problem is to

mography of Section 2.3, tkl = Pk ql is the epipole, and it use a robust blending operator such as the median. Another

is assumed that the vector nl = (nx , ny , nz , nd )T has been part of the solution is, during the blend, weight pixels for

normalized such that n2x + n2y + n2z = 1. The term w(ul ) which Bkl = 1 more than those for which Bkl = 0 (and

is a projective scaling factor which equals the reciprocal of B̃kl = 1). The weights should depend on the distance of

Q3l x, where Q3l is the third row of Ql and x is the world the pixel from the closest pixel for which Bkl = 1, in a

coordinate of the point. It is possible to write w(ul ) as a similar manner to the feathering algorithm of [28].

linear function of the image coordinates ul , but the depen- Given L̃l , our approach to pixel assignment is as fol-

dence on Ql and nl is quite complicated and so the details lows. We first compute a measure Pkl (ul ) of the likelihood

are omitted. Equation (10) can be used to map plane coor- that a pixel in Wkl ◦ M̃kl (ul ) is the warped image of the

dinates ul backwards to image coordinates uk , or to map

pixel ul in the enlarged sprite L̃l . There are a number of

the image Mkl forwards onto the plane. We denote the re-

ways of defining Pkl . Perhaps the simplest is the residual

sult of this warp by (Hlk , tkl , Zl ) ◦ Mkl , or more concisely

intensity difference [24]:

Wkl ◦ Mkl .

Almost any stereo algorithm could be used to compute Pkl = $L̃l − Wkl ◦ M̃kl $. (14)

Zl (ul ), although it would be preferable to use one favoring

small disparities. Doing so essentially solves a simpler (or Another is the magnitude of the residual normal flow:

what Debevec et. al [10] term a model-based) stereo prob- $L̃l − Wkl ◦ M̃kl $

lem. To compute the residual depth map, we initially set Pkl = . (15)

$∇L̃l $

Zl (ul ) to be the value (in a range close to zero) which min-

imizes the variance of (Hlk , tkl , Zl ) ◦ Mkl across k. After- 3 Alternatively, we could compare the input images I with the layer

k

sprite images warped back onto image coordinates (Wkl )−1 ◦ Ll . This

wards a simple smoothing algorithm is applied to Zl (ul ). means comparing the input image with a twice resampled, blended image.

Once the residual depth offsets have been estimated, the Both blending and resampling tend to increase blur, so, even if the pixel

layer sprite images should be re-estimated using: assignment was perfect, these images may well differ substantially.

4

Locally estimated variants of the residual normal flow have 2. Composite the un-warped sprites in back-to-front or-

been used by Irani and coworkers [16, 17, 15]. A final der (which can be computed from the plane equa-

possibility would be to compute the optical flow between tions):

Wkl ◦ M̃kl and L̃l . Then a decreasing function of the mag-

L

&

nitude of the flow could be used for Pkl .

Sk = Ukl = Uk1 % · · · % Ukl (20)

Next, Pkl is warped back into the coordinate system of

l=1

the input image Ik to yield:

to obtain the synthesized image Sk . If we have solved

P̂kl = (Wkl )−1 ◦ Pkl . (16) the stereo reconstruction problem, and neglecting re-

This warping tends to blur Pkl , but this is acceptable since sampling issues, Sk should match the input Ik .

we will want to smooth the pixel assignment anyway.4 The

new pixel assignment can then be computed by choosing This last step can be re-written as three simpler steps:

the best possible layer for each pixel: 2a. Compute the visibility of each un-warped sprite [30]:

%

1 if P̂kl (uk ) = minl! P̂kl! (uk )

Bkl (uk ) = (17) l−1

'

0 otherwise. Vkl = Vk(l−1) (1 − αk(l−1) ) = (1 − αkl! ) (21)

l! =1

3 Layer Refinement by Re-Synthesis

In this section, we describe how the estimates of the where αkl is the alpha channel of Ukl , and Vk1 = 1.

layer sprites can be refined, now assuming that their opaci-

2b. Compute the masked images, Mkl = Vkl Ukl .

ties αl are real valued. We begin by formulating a genera-

tive model of the image formation process. Afterwards, we (L

2c. Sum up the masked images, Sk = l=1 Mkl .

propose a measure of how well the layers re-synthesize the

input images, and show how the re-synthesis error can be In these last three substeps, the visibility map makes the

minimized to refine the estimates of the layer sprites. contribution of each sprite pixel to the image Sk explicit.

3.1 The Image Formation Process 3.2 Minimization of Re-Synthesis Error

We formulate the generative (forward) model of the im- As mentioned above, if the layer estimates are accurate,

age formation process using image compositing operations the synthesized image Sk should be very similar to the in-

[6], i.e. by painting the sprites one over another in a put image Ik . Therefore, we refine the layer estimates by

back-to-front order. The basic operator used to overlay the minimizing the prediction error:

sprites is the over operator:

))

F % B ≡ F + (1 − αF )B, (18) C= $Sk (uk ) − Ik (uk )$2 (22)

k uk

where F and B are the foreground and background sprites,

using a gradient descent algorithm. (In order to further con-

and αF is the opacity of the foreground [22, 6]. This defini-

strain the space of possible solutions, we can add smooth-

tion of the over operator assumes pre-multiplied opacities,

ness constraints on the colors and opacities [30].) Rather

as in Equation (1). The generative model consists of the

than trying to optimize over all of the parameters (Ll , nl ,

following two steps:

and Zl ) simultaneously, we only adjust the sprite colors and

1. Using the camera matrices, plane equations, and resid- opacities in Ll , and then re-run the previous motion estima-

ual depths, warp each layer backwards onto the coor- tion steps to adjust nl and Zl (see Figure 2 and Section 2).

dinate frame of image Ik using the inverse of the oper- The derivatives of the cost function C with respect to

ator in Section 2.4. This yields the un-warped sprite: the colors and opacities in Ll (ul ) can be computed using

the chain rule [30]. In more detail, the visibility map Vkl

Ukl = (Wkl )−1 ◦ Ll . (19)

mediates the interaction between the un-warped sprite Ukl

Note that the opacities should be warped along with and the synthesized image Sk , and is itself a function of the

the color values [6]. opacities in the un-warped sprites Ukl . For a fixed warping

4 We may want to smooth P

kl even more, e.g. using an isotropic function Wkl , the pixels in Ukl are linear combinations of

smoother such as a Gaussian. Other alternatives include, (1) performing a the pixels in sprite Ll . This dependence can either be ex-

color segmentation of each input image and only smoothing within each

segment in a similar manner to [2], and (2) smoothing Pkl less in the di-

ploited directly using the chain rule to propagate gradients,

rection of the intensity gradient since strong gradients often coincide with or alternatively the derivatives of C with respect to Ukl can

depth discontinuities and hence layer boundaries. be warped back into the reference frame of Ll [30].

5

(a) (b) (c)

(d) (e) (f)

(g) (h)

Figure 3: Results on the flower garden sequence: (a) first and (b) last input images; (c) initial segmentation into six layers;

(d) and (e) the six layer sprites; (f) depth map for planar sprites (darker denotes closer); front layer before (g) and after (h)

residual depth estimation.

(a) (b) (c)

(d) (e) (f)

Figure 4: Results on the symposium sequence: (a) third of five images; (b) initial segmentation into six layers; (c) recovered

depth map (darker denotes closer); (d) and (e) the five layer sprites; (f) residual depth image for fifth layer.

6

(a) (b) (c)

Figure 5: 3D views of the reconstructed symposium scene: (a) re-synthesized third image (note extended field of view). (b)

novel view without residual depth; (c) novel view with residual depth (note the “rounding” of the people).

**4 Experiments Section 2.4 and recompute the sprites. Since the corre-
**

To validate our approach, we experimented on two spondence is now much better across images, the resulting

multi-frame data sets. The first of these data sets is a stan- sprites are much less blurry. Figure 3(g) shows the original

dard motion sequence of a scene containing no indepen- sprite obtained for the lower flower bed, while Figure 3(h)

dently moving objects. The second consists of 40 images shows the same sprite after residual depth estimation.

taken simultaneously. The camera geometry is not given Our second set of experiments uses five images of a 40-

for either sequence, so we used point tracking and a stan- image stereo data set taken at a graphics symposium. Fig-

dard structure from motion algorithm to estimate the cam- ure 4(a) shows the middle input image, Figure 4(b) shows

era matrices. Our experiments do not yet include the results the initial pixel assignment to layers, Figure 4(c) shows

of applying the layer refinement step described Section 3. the recovered planar depth map, and Figure 4(f) shows

the residual depth map for one of the layers. Figures 4(d)

To initialize our algorithm, we first decided how many

and (e) show the recovered sprites. Figure 5(a) shows the

layers were required, and then performed a rough assign-

middle image re-synthesized from these sprites. Finally,

ment of pixels to layers by hand. Various automated tech-

Figures 5(b–c) show the same sprite collection seen from

niques for performing this initial labeling are described in

a novel viewpoint (well outside the range of the original

Section 2.1. Next, the automatic hierarchical parametric

views), first with and then without residual depth correc-

motion estimation algorithm described in [31] was used to

tion. The gaps in Figure 5 correspond to parts of the scene

find the 8-parameter homographies between the layers and

which where not visible in any of the five input images.

estimate the layer sprites. (For the experiments presented

in this paper, we set Ql = P1 , i.e. we reconstructed the 5 Discussion

sprites in the coordinate system of the first camera.) Us- We have presented a framework for stereo reconstruc-

ing the computed homographies, we found the best plane tion which represents the scene as a collection of approxi-

estimate for each layer using a Euclidean structure from mately planar layers. Each layer consists of a plane equa-

motion algorithm [33]. tion, a layer sprite image, and a residual depth map. The

The results of applying these steps to the MPEG flower framework exploits the fact that each layer implicitly lies

garden sequence are shown in Figure 3. Figures 3(a) and on a fixed plane in the 3D world. Therefore, we only need

(b) show the first and last image in the subsequence we used to recover three plane parameters per layer, independently

(the first nine even images). Figure 3(c) shows the initial of the number of images. We also showed how an initial

pixel labeling into seven layers. Figures 3(d) and (e) show estimate of the scene structure allows us to reason about

the sprite images corresponding to each of the seven lay- image formation. We proposed a forward model of image

ers, re-arranged for more compact display. (These sprites formation, and derived a measure of how well the layers

are actually the ones computed after residual depth esti- re-synthesize the input images. Optimizing this measure

mation.) Note that because of the blending that takes place allows the layer sprites to be refined, and their opacities

during sprite construction, each sprite is larger than its foot- estimated.

print in any one of the input images. Figure 3(f) shows a Our initial results are very encouraging, however further

depth map computed by painting every pixel with its corre- work is required to complete an implementation of the en-

sponding grey coded Z value, where darker denotes closer. tire framework. In particular, we are currently implement-

Once we have recovered the initial geometric structure, ing the layer refinement algorithm described in Section 3.

we recompute the homographies by directly adjusting the Other areas which we are exploring include automatic ini-

plane equations, as described in Section 2.2. We then run tialization of the layers and more sophisticated pixel as-

the the residual depth estimation algorithm described in signment strategies.

7

Acknowledgements Multi-layer, locally affine, optical flow and regularization

with transparency. In CVPR ’96, pages 307–314, 1996.

We would like to thank Michael Cohen, Mei Han, and

[19] R. Kumar, P. Anandan, and K. Hanna. Direct recovery of

Jonathan Shade for fruitful discussions, Harry Shum for his shape from multiple views: A parallax based approach. In

implementation of the residual depth estimation algorithm, 12th ICPR, pages 685–688, 1994.

and the anonymous reviewers for many useful suggestions [20] J. Lengyel and J. Snyder. Rendering with coherent layers.

and comments. In SIGGRAPH ’97, pages 233–242, 1997.

[21] S. Mann and R.W. Picard. Virtual bellows: Constructing

References high quality stills from video. In 1st ICIP, pages 363–367,

1994.

[1] G. Adiv. Determining three-dimensional motion and struc-

[22] T. Porter and T. Duff. Compositing digital images. SIG-

ture from optical flow generated by several moving objects.

GRAPH ’84, pages 253–259, 1984.

PAMI, 17(4):384–401, 1985.

[23] H. S. Sawhney. 3D geometry from planar parallax. In

[2] S. Ayer, P. Schroeter, and J. Bigün. Segmentation of mov-

CVPR ’94, pages 929–934, 1994.

ing objects by robust parameter estimation over multiple

[24] H.S. Sawhney and S. Ayer. Compact representations of

frames. In 3rd ECCV, pages 316–327, 1994.

videos through dominant and multiple motion estimation.

[3] J. R. Bergen, P. J. Burt, R. Hingorani, and S. Peleg. A three-

PAMI, 18(8):814–830, 1996.

frame algorithm for estimating two-component image mo-

[25] D. Scharstein and R. Szeliski. Stereo matching with non-

tion. PAMI, 14(9):886–896, 1992.

linear diffusion. In CVPR ’96, pages 343–350, 1996.

[4] J.R. Bergen, P. Anandan, K.J. Hanna, and R. Hingorani. Hi-

[26] S.M. Seitz and C.M. Dyer. Photorealistic scene reconstr-

erarchical model-based motion estimation. In 2nd ECCV,

cution by space coloring. In CVPR ’97, pages 1067–1073,

pages 237–252, 1992.

1997.

[5] M.J. Black and A.D. Jepson. Estimating optical flow in

[27] M. Shizawa and K. Mase. A unified computational the-

segmented images using variable-order parametric models

ory of motion transparency and motion boundaries based on

with local deformations. PAMI, 18(10):972–986, 1996.

eigenenergy analysis. In CVPR ’91, pages 289–295, 1991.

[6] J.F. Blinn. Jim Blinn’s corner: Compositing, part 1: Theory.

[28] H.-Y. Shum and R. Szeliski. Construction and refinement

IEEE Computer Graphics and Applications, 14(5):83–87,

of panoramic mosaics with global and local alignment. In

September 1994.

6th ICCV, 1998.

[7] M.-C. Chiang and T.E. Boult. Local blur estimation and

[29] R. Szeliski and J. Coughlan. Hierarchical spline-based im-

super-resolution. In CVPR ’97, pages 821–826, 1997.

age registration. In CVPR ’94, pages 194–201, 1994.

[8] R.T. Collins. A space-sweep approach to true multi-image

[30] R. Szeliski and P. Golland. Stereo matching with trans-

matching. In CVPR ’96, pages 358–363, 1996.

parency and matting. In 6th ICCV, 1998.

[9] T. Darrell and A.P. Pentland. Cooperative robust estimation

[31] R. Szeliski and H.-Y. Shum. Creating full view panoramic

using layers of support. PAMI, 17(5):474–487, 1995.

image mosaics and texture-mapped models. In SIG-

[10] P.E. Debevec, C.J. Taylor, and J. Malik. Modeling and ren-

GRAPH ’97, pages 251–258, 1997.

dering architecture from photographs: A hybrid geometry-

[32] J. Torborg and J.T. Kajiya. Talisman: Commodity realtime

and image-based approach. In SIGGRAPH ’96, pages 11–

3D graphics for the PC. In SIGGRAPH ’96, pages 353–363,

20, 1996.

1996.

[11] O.D. Faugeras. Three-Dimensional Computer Vision: A

[33] T. Viéville, C. Zeller, and L. Robert. Using collineations

Geometric Viewpoint. MIT Press, 1993.

to compute motion and structure in an uncalibrated image

[12] S. Hsu, P. Anandan, and S. Peleg. Accurate computation

sequence. IJCV, 20(3):213–242, 1996.

of optical flow by using layered motion representations. In

[34] J.Y.A. Wang and E.H. Adelson. Layered representation for

ICPR ’94, pages 743–746, 1994.

motion analysis. In CVPR ’93, pages 361–366, 1993.

[13] S.S. Intille and A.F. Bobick. Disparity-space images and

[35] Y. Weiss. Smoothness in layers: Motion segmentation us-

large occlusion stereo. In 2nd ECCV, 1994.

ing nonparametric mixture estimation. In CVPR ’97, pages

[14] M. Irani and P. Anandan. A unified approach to moving

520–526, 1997.

object detection in 2D and 3D scenes. In 12th ICPR, pages

[36] Y. Weiss and E.H. Adelson. A unified mixture framework

712–717, 1996.

for motion segmentation: Incorporating spatial coherence

[15] M. Irani, P. Anandan, and S. Hsu. Mosiac based repre-

and estimating the number of models. In CVPR ’96, pages

sentations of video sequences and their applications. In

321–326, 1996.

5th ICCV, pages 605–611, 1995.

[37] Y. Yang, A. Yuille, and J. Lu. Local, global, and multilevel

[16] M. Irani and S. Peleg. Image sequence enhancement using

stereo matching. In CVPR ’93, pages 274–279, 1993.

multiple motions analysis. In CVPR ’92, pages 216–221,

1992.

[17] M. Irani, B. Rousso, and S. Peleg. Detecting and track-

ing multiple moving objects using temporal integration. In

2nd ECCV, pages 282–287, 1992.

[18] S.X. Ju, M.J. Black, and A.D. Jepson. Skin and bones:

8

- NatronUploaded byJose Luis Nuñez
- Vision Based Plant Leaf Disease Detection on The Color Segmentation through Fire Bird V RobotUploaded byGRD Journals
- P5Uploaded byMs. B.B.U.P. Perera - University of Kelaniya
- 06262482Uploaded byAdalberto Macdonald
- 10.1.1.58.3283Uploaded byz2048
- DiameterJ - ImageJUploaded bysamia
- Slides Aula18Uploaded byJanainaFernandes
- Digital Image Processing IEEE projects 2016-2017Uploaded byLeMeniz Infotech
- Fast ICA processedUploaded byjohnson
- Paper for Histrogram Feature and GLCM FeatureUploaded byMoorthi Velu
- LIOCRUploaded byEr Mukesh Mistry
- Ieee Paper 2Uploaded bymanosekhar8562
- oriya character recognitionUploaded byBarid Nayak
- Ieee Sb SeminarUploaded byChanukya Krishna Chama
- Percept RonUploaded byAbdulrahman Alsalawi
- PerceptionUploaded byPaul Hebert
- Application of Normalized Cross Correlation to Image RegistrationUploaded byInternational Journal of Research in Engineering and Technology
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer VisionUploaded byIOSRjournal
- Detection of Knee Osteoarthritis by Measuring the Joint Space Width in Knee X-ray ImagesUploaded byAnonymous vQrJlEN
- 10.1.1.332.1208.pdfUploaded bydraconbo
- IJETAE_0514_36.pdfUploaded byVikas Kumar
- luis 18Uploaded byJosé santana
- bordes redondeadosUploaded byJack Kcaj
- 2010-EG-CurvesUploaded bythanguct
- 2003-Insight-aUploaded byAndres Guevara
- Content-Based Image Retrieval Using SketchUploaded byŞofineţi Mihai
- A Study of Color Histogram Based Image RetrievalUploaded byCricket Live Streaming
- KruegerUploaded byAcusmata
- photo editing work order2017Uploaded byapi-360341912
- PART6Uploaded byjbsimha3629

- 09.18.Nonlinear Total Variation Based Noise Removal AlgorithmsUploaded byAlessio
- 09.21.The mathematics of moving contact lines in thin liquid ﬁlmsUploaded byAlessio
- 08.21.What Went WhereUploaded byAlessio
- 09.14.Scale-Space and Edge Detection Using Ani So Tropic DiffusionUploaded byAlessio
- 09.22.LCIS_A Boundary Hierarchy for Detail-preserving Contrast ReductionUploaded byAlessio
- 09.17.Explicit Algorithms for a New Time Dependent Model Based on Level Set Motion for Nonlinear Debluring and Noise RemovalUploaded byAlessio
- 08.20.Layered Representation for Motion AnalysisUploaded byAlessio
- 09.13.A new image processing primitive_reconstructing images from modiﬁed ﬂow ﬁeldsUploaded byAlessio
- 09.15.Image Selective Smoothing and Edge Detection by Nonlinear DiffusionUploaded byAlessio
- 09.23.Mathematical Models for Local Deter Minis Tic In PaintingsUploaded byAlessio
- 09.16.Fronts Propagating With Curvature Dependent Speed_algorithms Based on Hamilton-Jacobi FormulationsUploaded byAlessio
- 08.17.Layered Depth ImagesUploaded byAlessio
- 09.04.Detection of Missing Data in Image SequencesUploaded byAlessio
- 09.20.a Variational Level-set Approach to Multi Phase MotionUploaded byAlessio
- 09.05.Interpolation of Missing Data in Image SequencesUploaded byAlessio
- 06.09.Camera Stabilization Based on 2.5D Motion Model Estimation and Inertial Motion FilteringUploaded byAlessio
- 09.07.Combining Frequency and Spatial Domain Information for Fast Interactive Image Noise RemovalUploaded byAlessio
- 01.20.Texture Synthesis by Non-parametric SamplingUploaded byAlessio
- 06.11.Fast Software Image Stabilization With Color RegistrationUploaded byAlessio
- 09.10.Texture characterization via joint statistics of wavelet coefﬁcient magnitudesUploaded byAlessio
- 09.24.Filling-in by joint interpolation of vector ﬁelds and grey levelsUploaded byAlessio
- 08.14.Jetstream Probabilistic Contour Extraction With ParticlesUploaded byAlessio
- 09.08.Texture Synthesis by Non-parametric SamplingUploaded byAlessio
- 06.10.Digital Image Stabilizing Algorithms Based on Bit-plane MatchingUploaded byAlessio
- 06.02.Real-Time Scene Stabilization and Mosaic ConstructionUploaded byAlessio
- 09.09.Pyramid Based Texture Analysis_synthesisUploaded byAlessio
- 06.07.Fast 3D Stabilization and Mosaic ConstructionUploaded byAlessio
- 06.12.Image Stabilization by Features TrackingUploaded byAlessio
- Image Motion DeblurringUploaded byAlessio
- 09.12.Level-Lines Based Dis OcclusionUploaded byAlessio