You are on page 1of 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 1

4D Light Field Segmentation from Light Field


Super-pixel Hypergraph Representation
Xianqiang Lv, Xue Wang, Qing Wang, Senior Member, IEEE and Jingyi Yu

Abstract—Efficient and accurate segmentation of full 4D light fields is an important task in computer vision and computer graphics.
The massive volume and the redundancy of light fields make it an open challenge. In this paper, we propose a novel light field
hypergraph (LFHG) representation using the light field super-pixel (LFSP) for interactive light field segmentation. The LFSPs not only
maintain the light field spatio-angular consistency, but also greatly contribute to the hypergraph coarsening. These advantages make
LFSPs useful to improve segmentation performance. Based on the LFHG representation, we present an efficient light field
segmentation algorithm via graph-cut optimization. Experimental results on both synthetic and real scene data demonstrate that our
method outperforms state-of-the-art methods on the light field segmentation task with respect to both accuracy and efficiency.

Index Terms—Hypergraph representation, Light field segmentation, Light field super-pixel, Graph-cut optimization.

1 I NTRODUCTION

S EGMENTATION is a long-standing and important prob-


lem in computer vision and graphics. One common
approach is scribble-based interactive segmentation, which
is widely applied for image segmentation and advanced
image editing. Specifically, users provide sparse scribbles
on a single image [1] or a group of related images [2],
which define segments by propagating the properties of
selected pixels to other pixels in the image/images. Due
to the image uncertainties occurring from gray levels and
boundary ambiguities in an image, accurate segmentation
(a) (b) (c) (d) (e)
in real images is still a fairly difficult problem.
Recently, there are commercially available light field Fig. 1. Our segmentation result of a light field. (a) The raw image of
cameras such as the Lytro and Raytrix cameras which are the central view overlayed with our segmentation result. (b,d) Close-ups
capable of capturing a few hundred of views in a single shot. of several regions of interest (ROIs) in (a). (c,e) For each ROI, from
Compared with previous equipments, one of the biggest top to bottom: the horizontal EPIs of the original light field and of the
segmentation result respectively, the vertical EPIs of the original light
advantages of light field cameras is they provide passive field and of the segmentation result respectively.
depth estimation, which benefits many vision algorithms
and advanced image editing, such as refocus [3], depth and
scene flow estimation [4], [5], [6], [7], super resolution [8], rays passing through different views. Our approach builds
[9], material recognition [10], [11], [12], etc. By processing upon the hypergraph representation of a light field. We
the rays recorded in a light field, we could analyze the scene construct a LFHG representation based on LFSPs [13] and
structure and solve some problems that are intractable with propose a coarsening strategy to reduce the complexity of
traditional 2D images, such as the segmentation ambiguity the original hypergraph model. For the simplified LFHG, an
[13]. Several approaches have been proposed to segment or energy function fusing position, appearance and disparity
edit a light field [14], [15], [16], [17], [18]. However, due to is established and optimized using a graph-cut algorith-
the huge volume of 4D light field data and the redundan- m [19]. Figure 1 shows segmentation results of a light
cy between different views, many segmentation methods field. Our method could obtain accurate segmentations for
tailored for light fields are either time-consuming or not boundaries and detailed areas in the image. Moreover, due
proposed for full 4D data. Moreover, final segmentation to the consistency-preserving attribute of our method, the
results usually rely on accurate depth estimation. EPI patterns of segmentation results are consistent across
In this paper, we present a new light field segmenta- different views, also consistent with corresponding EPIs of
tion scheme that exploits high-order relations among the the original light field.
Graph-cut optimization is one of the most popular and
• X. Lv, X. Wang and Q. Wang (corresponding author) are with efficient image segmentation techniques, which formulates
the School of Computer Science, Northwestern Polytechnical Uni- image segmentation in terms of energy minimization [1],
versity, Xi’an 710072, China. (e-mail: xianqianglv@mail.nwpu.edu.cn,
qwang@nwpu.edu.cn.) [16], [19]. It generally maps an image to a weighted undi-
• Jingyi Yu is with the Shanghai Tech University, Shanghai 200031, China, rected graph and treats pixels as vertices. Different from the
(e-mail: jingyi.udel@gmail.com) traditional graph, which can only describe binary relations
Manuscript received April 19, 2018; revised August 26, 2018. between vertices, a hypergraph [20] can represent high-

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 2

order relations. As a natural high-order relation descriptor, and 4D light field coordinates, and will degrade for complex
the hypergraph is particularly appropriate for representing scenes.
complex relations within a light field. In order to segment light fields accurately, some rep-
Considering the massive input volume of a light field, if resentative works have been proposed. Wanner et al. [15]
the vertices are used to denote pixels, the whole hypergraph introduce an effective disparity cue for 4D light field seg-
would be very complex. The coarsening strategy is com- mentation for the first time. They first use a set of input
monly adopted to generate a simplified graph by collapsing scribbles on the central view to train a random forest
appropriate vertices. For a full parallax light field, its spatio- classifier based on disparity and appearance cues. Then,
angular consistency is a unique property. Since LFSPs are a global consistency term is optimized using all views si-
capable of preserving consistencies between different views multaneously. However, the segmentation is not performed
[13], they can scale down the data volume greatly in both on the full 4D light field (only the central view image
spatial and angular domains. Angularly, a LFSP contains is segmented) and the implementation is time-consuming.
several 2D slices by fixing the angular dimension. Spatially, Mihara et al. [16] define two neighboring ray types (spatial
the 2D slices of a LFSP contain a number of corresponding and angular) in light fields and design a learning-based
pixels by fixing the spatial dimension. likelihood function. They build a graph in 4D space and use
The experiments are conducted on both synthetic and a graph-cut algorithm to get an optimized segmentation.
real light field data captured by Lytro [21] and Illum [22]. Due to the high complexity of this method, only partial
Experimental results show that our approach is competitive view-points (5 × 5) are used in the experiments. In order
with state-of-the-art techniques with a significantly lower to improve efficiency, Hog et al. [17] propose a novel graph
computational complexity. The main contributions are sum- representation for interactive light field segmentation using
marized as follows, Markov Random Field (MRF). By exploiting the redundancy
(1) A LFHG representation is constructed to model the in the ray space, the ray-based graph structure reduces the
high-order relations among the rays passing through differ- graph size and thus decreases the running time of the MRF-
ent views. By reformulating the correlation in the 4D ray based optimization. However, the proposed free-rays and
space, instead of traditional 2D pixel space, our LFHG rep- ray bundles are greatly influenced by depth estimation. In
resentation proves to be efficient for light field segmentation addition, the requirement for depth maps for all views limits
and other low-level vision tasks. its applications in practical problems.
(2) A novel light field segmentation algorithm using As an important light field preprocessing method, light
LFHG and LFSP is proposed. In view of the ray sampling field super-pixels/super-rays are proposed by [13], [18],
nature of LFSP, we define the spatial and angular neighbors [33]. Different from [18] which adopt a graph-cut optimiza-
of LFSP and propose a LFHG coarsening strategy. Then tion, we try to exploit the high-order relations between light
the light field segmentation can be solved via a traditional rays and build a hypergraph representation with LFSPs [13],
graph-cut optimization. which enables a generalized light field processing pipeline.
Also, a different neighboring system is defined. We adopt
a distance-based neighborhood definition to take isolated
2 R ELATED W ORK LFSPs into account, which can provide more flexibility to
As a fundamental problem in computer vision, image seg- adjust the range and weights of neighborhoods. Our algo-
mentation has been extensively studied in recent decades. rithm segments the 4D light fields as a whole to maintain
Various solutions are proposed to tackle the problem, such the consistency across views and meanwhile improves the
as region-based method [23], threshold-based method [24], efficiency significantly.
graph-based method [25], learning-based method [26], etc.
Among these, graph-cut optimization has been proven to be
effective for the multi-label segmentation problem [19], [25], 3 L IGHT F IELD H YPERGRAPH
[27]. They usually employ a max-flow/min-cut optimization
[28]. Although these methods perform well in certain con- The theory of graph-cut optimization has been successfully
ditions, however, due to the image uncertainties and spatial applied to the image segmentation problem. Generally, a
ambiguities in defocus and boundary areas, establishing a weighted graph is constructed by taking each pixel as a
thorough segmentation solution is still an open challenge. node and connecting each pair of pixels by an edge. The
Light fields [29] record both angular and spatial infor- weight on an edge connecting two nodes is computed to be
mation of the scene through 4D data. To keep consistency a measurement of the similarity between these two nodes,
between views, propagation methods have been applied to usually using the brightness values of the pixels and their
propagate sparse user edits to the full light field [14], [30], spatial locations. As for light field segmentation, since a light
[31], [32]. To enable efficient calculations of these method- field can be formulated into a multi-view representation, to
s, of which the complexity is linear with the input data get a consistent segmentation on the full 4D light field, the
size, different strategies are utilized. Ao et al. [30] exploit relations between two scene points from different views are
the epipolar geometry of light fields and propose a repa- very helpful to find their complementation. However, the
rameterization method to change the epipolar geometry. relations among multi-modal features from different aspect-
Besides, the downsampling and upsamping techniques are s/views are too complex to be described using the binary
often used in propagation methods [14], [30], [31]. However, correlation only, such as a graph. Therefore, it is necessary
their performance are greatly influenced by the clustering to find a new representation for high-order relations among
operation in a 7-dimensional space including appearances multi-modal features.

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 3

hyperedge
l1

……
Scribble LFSPs

lk

……
Input light field LFHG structure Optimized segmentation results
Disparity Initial segmentation

(a) (b) (c) (d)

Fig. 2. The pipeline of the proposed LFSP based light field segmentation. (a) The 3 × 3 representation of an input light field. (b) Data preprocessing
and intermediate results, including user scribbles, disparity estimation, LFSP segmentation by [13] and initial segmentation results respectively. (c)
The light field hypergraph structure based on LFSPs. (d) Optimized light field segmentation results.

Our goal is to efficiently segment a full light field. The


l1 …… lk …… ……
input contains a 4D light field and user scribbles on the Labels
central view image (or any arbitrary view image). We first
compute the disparity map for the central view. Based on
the disparity map, LFSPs on the full light field volume Hyperedges
(t-link)
is estimated using the algorithm proposed by Zhu et al.
[13]. Combined user scribbles, disparity and LFSP, the initial
segmentation on the central view image is obtained. We then p1,u ,Lv
Hyperedges p1,u ,Nv
formulate light field segmentation as a hypergraph parti- u ,v
(n-link) u ,v
…… ……
tioning problem. The LFHG structure is built and further p uj ,,Lv piu, ,Nv
coarsened based on LFSPs. Finally, the optimized light field u ,v u ,v

…… ……
segmentation on the full 4D light field is achieved using an
…… ……
energy minimization method. The final global optimization
Labeled parts Non-labeled parts
step enhances per-image segmentation results by removing
noises that are not supported by multiple views, thus the
resulting segmentations are globally consistent throughout Fig. 3. The structure of our LFHG representation. The set of vertices
the light field. For better visualization, we use the original consists of labeled parts, non-labeled parts and labels. All the lines here
are actually hyperedges, of which each one connects more than two
view image overlayed with multi-colored segments together vertices.
with EPI images for several ROIs to illustrate the optimized
segmentation results. The pipeline of the proposed algorith-
m is shown in Figure 2. Different from graph edges which are pairs of nodes, hyper-
In this section, we first introduce the definition of our edges contain an arbitrary number of nodes. Therefore, as a
LFHG representation. Then, a LFHG coarsening method generalization of classic graph, the hypergraph is a nature
based on LFSPs is presented. Finally, we formulate the light higher-order relations descriptor.
field segmentation as an energy minimization problem. Mathematically, we define light field segmentation as a
hypergraph partitioning problem. As shown in Figure 3,
given user scribbles, the vertices and hyperedges can de
3.1 Hypergraph Representation of Light Field
defined as,
In a light field LF , a light ray could be described by two- ( ( ) ( ) )
parallel-plane (TPP) model [29]. Suppose the uv plane is the V =
[ [
pu,v ,
[ [
pu,v ,
[
{lk } ,
i,N j,L
view plane and the xy plane is the image plane. A light ray i u,v j u,v k
r emitted from a scene point can be uniquely determined  ( ) ( )
by r = LF (u, v, x, y), which corresponds to an indexed
[ [ [ [
E= pu,v
i,N , lk , pu,v
j,L , lk ,
pixel of the raw image inside the camera. Another way  (1)
i,k u,v j,k u,v
of representing light fields is the multi-view representation, (

)
which divides the 4D light filed into different views (u, v) [ [ [ 
and corresponding images (x, y). For a better understand- pu,v
m , pu,v
n ,
u∗ ,v ∗ u∗ ,v ∗

m,n:∃(u∗ ,v ∗ ),pn ∈N (pm ) u,v u,v 
ing, in this paper we will discuss the light field hypergraph
representation in a multi-view way. u,v
where pi,N is the i-th non -labeled part in S view (u, v)
S, and
A hypergraph [20] is a generalization of a graph in
pu,v
j,L the j -th labeled part in view (u, v) . pu,v
i,N and pu,v
j,L
which an edge can join any number of vertices. Formally, u,v u,v
a hypergraph H = (V, E) is defined as a set of vertices V , denote the sets of corresponding parts across views in a light
and a set of non-empty subsets of V called hyperedges E . field. We use the tags L/N to distinguish between labeled

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 4

parts and non-labeled parts, lk is the k -th type of segment field is challenging. To overcome these, we build the LFHG
label. Note that, the variable p here could be a single light structure based on LFSPs, which could eliminate defocus
ray or a set of light rays emitted from a small continuous and occlusion ambiguity essentially for segmentation [13],
3D surface with similar photometric properties, which cor- and try to simplify the hypergraph partitioning problem via
responds to a single pixel or a super-pixel respectively. a coarsening method.
The set V is composed of three types of vertices: labeled
parts, non-labeled parts, and segment labels. The set E con- 3.2.1 Definition of LFSP
sists of two types of hyperedges. Similar to the definitions Comparing to a super-pixel, a small patch consisting of
of n-links (neighborhood links) and t-links (terminal links) pixels with similar appearance, brightness, or texture in
introduced in traditional graph-based
S u,v segmentation S u,v meth- a 2D image, which may lose geometric structure of the
ods [19], [28], hyperedges { pi,N , lk } and { pj,L , lk } scene, the LFSP consists of a set of light rays emitted from
u,v u,v
a small continuous 3D surface with similar photometric
are S u,v with the same cardinality Nuv + 1, while
S both t-links properties and captured by a light field camera. The 3D
{ pu,v
m , pn } is a n-link of which the cardinality is
u,v u,v distribution of light rays is influenced by the scene structure.
2 × Nuv , where Nuv is the angularSsampling S number of a Thus, conversely, the structure will be implicitly encoded
light field. For any possible pair ( pu,v
m , pu,v
n ) , if there in the LFSP. Mathematically, let LF (u, v, x, y) be a light
u,v u,v
∗ ∗ field and R be a small continuous 3D surface with similar
exists a view (u∗ , v ∗ ) in which pun ,v is a neighbor of
∗ ∗ ∗ ∗ ∗ ∗ photometric characteristics, a LFSP SR (u, v, x, y) could be
pum ,v
, i.e.Spun ,v ∈ N (pum ,v ), there will exist a hyperedge
defined as follows [13],
{ pm , pu,v
S u,v
n }. The neighboring system is decided ac-
u,v u,v |R|
[
cording to the specific definition of the part p. Thus the SR (u, v, x, y) = LF (uPi , vPi , xPi , yPi ), (2)
rank of our hypergraph is 2 × Nuv , which is the maximum i=1
cardinality of any of the edges in the hypergraph. where LF (uPi , vPi , xPi , yPi ) is a bundle of rays emitted
The multiview representation of a LFHG is shown in Fig- from a 3D point Pi on the surface R, and | · | denotes the
u,v
ure 4. The Red patche pj,L is one labeled part in view (u, v). number of elements in the set.
The blue, yellow and gray patches represent non-labeled Obviously, the 4D LFSP consists of angular dimensions
parts in different views. Green dotted lines connecting
S u,v yel- (u, v) and spatial dimensions (x, y). For a specific view
low patches and label lk make up a hyperedge { pi,N , lk }. u∗ ,v ∗
u,v (u∗ , v ∗ ), the patch SR is a 2D slice of SR . Due to the
Orange dotted lines connectingSyellowSpatches and blue Lambertian assumption, all the slices from different views
u,v
patches make up a hyperedge { pi,N , pu,v
n }. Neighbor- should be similar. This leads to a refocus-invariant metric
u,v u,v for LFSP segmentation, named LFSP self similarity [13]. It
ing relations between different patches are important for the is remarkable that the consistency in angular dimension is
light field segmentation problem. maintained by LFSPs. By fixing (u, x) to (u∗ , x∗ ), or (v, y) to
(v ∗ , y ∗ ), we can obtain horizontal or vertical EPIs of a light
l1 …… lk …… field. A LFSP is a combination of adjacent EPI lines with
similar gradients and appearances.
Hyperedge
3.2.2 Neighboring Relation of LFSPs
To build the LFHG structure based on LFSPs, the neigh-
borhood system of LFSPs still needs to be discussed. Dif-
ferent from the fact that a pixel in a 2D image only has
…… spatial neighbors, for a pixel pr corresponding to a light ray
r = LF (u, v, x, y) in a light field, there are two types of
neighbors, spatial neighbors and angular neighbors, which
……

……

p uj ,,Lv can be formulated as,


pnu ,v……
(u, v, x ± 1, y + 1)

piu, ,Nv 

(u, v, x ± 1, y − 1)
Nspa (pr ) =
 (u, v, x ± 1, y)
 (u, v, x, y ± 1),

(3)
Fig. 4. A multiview representation of our LFHG definition. The vertices  (u ± 1, v + 1, x ± d(pr ), y + d(pr ))
V and hyperedges E are shown in a simplified representation for better (u ± 1, v − 1, x ± d(pr ), y − d(pr ))

visualization. Nang (pr ) =
 (u ± 1, v, x ± d(pr ), y)

(u, v ± 1, x, y ± d(pr )),
where Nspa (pr ) and Nang (pr ) represent spatial and angular
3.2 LFHG based on LFSP neighbors of pr respectively, and d(pr ) represents the dis-
Given the current LFHG structure, when p denotes a single parity of pr .
light ray, the complexity of the hypergraph will be extremely Considering that LFSPs are irregular in shape in most
high, which will lead to intractability and inefficiency in the cases, the spatial relation of LFSPs needs to be redefined.
final segmentation stage. Besides, accurate dense matching Here we use specific ranges to define adjacent LFSPs with-
for every single light ray between different views of a light out sharing common frontiers for two reasons mainly. First,

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 5

it is not necessary for neighboring LFSPs to share common slices in view (u, v) might be spatial neighbors. From ex-
frontiers, as shown in Figure 5. The distance-based neigh- periments we observed that direct angular neighbors and
bor system is the key to correctly dealing with isolated indirect angular neighbors have different attributions. Our
LFSPs. Second, our neighbor system enables more flexible analysis in the energy function section for optimization will
operations. For example, we can define a wider range of further reveals that we can analyze the types of angular
neighbors, or set different weights for neighboring LFSPs at neighbors to distinguish their impacts on segmentation.
different distances. Combining spatial and angular relationship of LFSPs
u,v
Let Sm and Snu,v be the 2D slices of LFSPs Sm and Sn in together, we can obtain the neighborhood set Ω for all LFSPs
view (u, v) respectively, their spatial relation can be defined in a light field:
as, ∗ ∗ ∗
,v ∗
u,v
Ω = {(Sm , Sn )|∃(u∗ , v ∗ ), Nspa (Sm
u ,v
, Snu ) = 1}. (6)
) − Px,y (Snu,v )k ≤ T

u,v 1, kPx,y (Sm
Nspa (Sm , Snu,v ) =
0, otherwise,
The vertex in Eq. (1) denotes a single light ray. To
(4)
u,v simplify the LFHG representation, here we adopt a LFSP
where Px,y (Sm ) and Px,y (Snu,v ) denote the center positions
u,v u,v as a vertex, which are composed of a beam of light rays
of Sm and Sn respectively, k·k is the Euclidean distance,
emitted from the scene. Therefore we can rewrite Eq. (1) as
and T is√a predefined threshold. In our experiments, we
set T = 3M to generate a moderate number of spatial (
[
(
[
)
[
(
[
)
[
)
u,v u,v
neighbors [13], [18], [34], where M is the super-pixel size. In V = Si,N , Sj,L , {lk } ,
section 5.2, we will discuss the influence of M on segmen-  (
i u,v j u,v k
tation accuracy and efficiency. [ [
)
[
(
[
)
u,v u,v
For angular relationship between LFSPs, we define two E= Si,N , lk , Sj,L , lk , (7)
types of angular neighbors, which are direct angular neigh-

i,k u,v j,k u,v
u∗ ,v ∗ )
bors and indirect angular neighbors. Supposing Sm and (

u∗ ,v ∗
[ [ [
∗ ∗ u,v
Snu,v
Sn are spatial neighbors in view (u , v ) according to Sm , .
u,v
Eq. (4), if Sm and Snu,v are also spatial neighbors in view

m,n:(Sm ,Sn )∈Ω u,v u,v
∗ ∗
(u, v), Sm and Snu ,v are considered as direct angular
u,v

neighbors. Otherwise, they are considered as indirect an-


u∗ ,v ∗ 3.3 The LFHG coarsening strategy
gular neighbors. Similar relations hold for Sm and Snu,v .
The direct and indirect angular neighbors can be defined as After defining the neighbor system between 2D slices of
follows, different LFSPs, we build the LFHG H = (V, E) accord-
u ,v ∗ ∗ ∗ ∗ ing to Eq. (7). To further simplify H and reduce its com-
when Nspa (Sm , Snu ,v ) = 1
u,v
putational complexity, we propose an effective coarsening
, Snu,v ) = 1

u,v ∗ ∗ direct, Nspa (Sm
Nang (Sm , Snu ,v ) = strategy based on the self similarity property of LFSP. After
indirect, Nspa (Sm , Snu,v ) = 0.
u,v

(5) coarsening, the final LFHG is homomorphic to a weighted


undirected graph and can be segmented using traditional
graph-cut algorithms.
Different from spatial and angular neighbors discussed
in Section 3.2, which describe neighboring relations between
2D slices of different LFSPs, we still need to exploit re-
lations between 2D slices of the same LFSP. All the 2D
slices corresponding to the same LFSP in different views
Smu ,v are called self-neighbors. To maintain the continuity of the
light field segmentation across different views, all the self-
neighbors should be given the same segment label. Due to
the Lambertian assumption, all the slices in different views
have similar shapes, textures, colors, and depths. When
(u∗ , v ∗ ) (u, v ) fixing an angular dimension and a spatial dimension, the
EPI lines corresponding to one LFSP are similar in gradient,
Fig. 5. Spatial and angular relations of 2D slices of LFSPs in (u, v) appearance and position. This is the manifestation of 4D LF-
and (u∗ , v ∗ ). Red circles represent the range of spatial neighbor whose SPs self similarity in 2D images. The process of coarsening
semi-diameter is T . Blue patches in (u, v) and (u∗ , v ∗ ) are spatial and
direct angular neighbors of Sm u,v
respectively. The green patch in view a hypergraph is to merge similar vertices and reduce the
u,v hypergraph complexity. The self similarity of LFSPs can be
(u∗ , v ∗ ) is an indirect angular neighbor of Sm .
utilized to coarsen the original hypergraph.
From the example shown in Figure 5, there are three Specifically, all the vertices corresponding to the self-
important findings. First, it is not necessary for neighbor neighbors of one LFSP will be merged to a new vertex
LFSPs to share common frontier. Second, for LFSPs Sm and corresponding to the LFSP, thus related hyperedges (includ-
Sn , if their slices in view (u∗ , v ∗ ) are spatial neighbors, their ing both n-links and t-links) will become traditional edges.
slices in view (u, v) are very likely to be spatial neighbors Further, user scribbles will remove unnecessary t-links for
too. Third, for LFSPs Sm and Sn , even when their slices in labeled LFSPs. Let Ck denote the cluster with segment label
view (u∗ , v ∗ ) are not spatial neighbors, their corresponding lk , when Sj,L ∈ / Ck , the edge {Sj,L , lk } can be removed.

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 6

Therefore, the 2-uniform hypergraph Ĥ = (V̂ , Ê) after


coarsening can be written as follows,
( ) N uv N uv N uv
[ [ [
V̂ = Si,N , Sj,L , lk hyperedge
i j k
 l1
[

……
[
Ê = {Si,N , lk }, {Sj,L , lk }, (8) S1 S2 S3

i,k j:Sj,L ∈Ck lk
 n-links

……
[ 
{Sm , Sn } . t-links

m,n:(Sm ,Sn )∈Ω
l1 …… lk ……
The following table gives the distances/costs of hyper-
edges Ê . Edata and Esmooth will be discussed in the next
section. Note that for each n-link {Sm , Sn }, we need to Fig. 6. An example of the LFHG structure in a multi-view representation
figure out the distribution of all the neighboring relations (left) and the LFHG coarsening strategy (right). Instead of original view
u,v
between Sm and Snu,v , which includes the number of spa- images, enlarged detail views of several surrounding LFSPs are shown.
tial neighbors within the same view, the number of direct an-
gular neighbors and the number of indirect angular neigh-
bors across different views. Since the binary neighboring The term Edata calculates the individual penalty for assign-
relations established on distance metric is symmetric, when ing lk to Si . The term Esmooth measures the penalty for a
the center view is chosen as the reference view, we only discontinuity between adjacent LFSPs in both spatial and
u0 ,v0
consider relations between Sm and Snu,v . Note that the angular space, i.e. the boundary properties of segmentation
total number of above mentioned three types of neighbors L. The coefficient λs controls the balance of the smoothing
should take an integer value in the range [1, Nuv ]. term to avoid over-smoothing. We set λs = 10 in most ex-
periments. The α-expansion algorithm [19] is used to solve
TABLE 1 Eq. (9). More details of the data term and smoothness term
Hyperedge distances/costs of the LFHG after coarsening. will be discussed in the following subsections separately.

edge distance (cost) for


{Si,N , lk } Edata (Si,N , lk ) Si,N ∈ Ck
0 Sj,L ∈ Ck 4.1 Data term
{Sj,L , lk }
removed Sj,L ∈/ Ck
{Sm , Sn } λs Esmooth (Sm , Sn ) (Sm , Sn ) ∈ Ω
For interactive segmentation, a user needs to draw different
scribbles of the object of interest on the reference view. In
A real example of the LFHG structure in a multi-view this paper, we choose central view to draw scribbles on
representation and the LFHG coarsening strategy is shown for convenience of calculation. Then we propagate user
in Figure 6. Instead of original view images, enlarged detail inputs to LFSPs and obtain labeled LFSPs. Labeled LFSPs
views of several surrounding LFSPs are shown. The left sub- are utilized as seeds to compute their similarities with non-
image is the multi-view representation of LFHGs. In order labeled LFSPs by combining multiple cues.
to facilitate the graph representation, we choose 3 × 3 views Color, position and disparity cues capture different phys-
instead of full views and mark only several neighboring ical characteristics of an object. So it is important to integrate
LFSPs in each view. A set of red dotted lines forms a t-link. these cues for light field segmentation. When users are
A set of orange dotted lines forms a n-link. The right sub- interested in a particular object or want to edit a light
image illustrates the LFHG coarsening strategy. The LFSP S field, they will frequently change input scribbles. In this
and label lk represent the coarsened vertices V̂ . Meanwhile, case, we calculate LFSP similarities in color, disparity and
n-links and t-links represent coarsened hyperedges Ê . The position separately and combine them by adjusting weights
LFSP S is obtained from image patches from Nuv views to balance the influences of different cues. Experimental
vertically stacked. results show that this simple strategy works effectively on
both synthetic and real data. Therefore, Edata (Si , lk ) can be
computed as the minimum distance between Si and Sj,L
4 S EGMENTATION U SING LFHG with the same lk , i.e. LSi,N = LSj,L = lk . Concretely,
After building the LFHG structure, the 4D light field seg-
mentation is performed as a graph-cut problem. Supposing arg min (EC (Si , Sj,L ) + λp EP (Si , Sj,L ) + λd ED (Si , Sj,L )) ,
j:LS =lk
L is the label vector that assigns label LSi to LFSP Si , the j,L
(10)
energy function is defined as follows,
where EC , EP and ED measure color, position and disparity
distances between Si and Sj,L respectively, λp and λd are
X
E(L) = Edata (Si , lk )
i:LSi =lk used to
 balance influences between different cues. We set
X (9) λp = 1 200M 2 and λd = 500 in the experiments.
+ λs Esmooth (Sm , Sn ).
m,n:(Sm ,Sn )∈Ω
The energy terms of color, position and disparity cues

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 7

are defined as follows, balance between direct and indirect angular neighbors, we
X assign a smaller value to the penalty for direct angular
EC (Si , Sj,L ) = kCcha (Si ) − Ccha (Sj,L )k,
neighbors,
cha
(11) u0 ,v0
EP (Si , Sj,L ) = kPx,y (Si ) − Px,y (Sj,L )k , , Snu,v ) = direct

u0 ,v0 0.5, Nang (Sm
ED (Si , Sj,L ) = kD(Si ) − D(Sj,L )k , f (Sm , Snu,v ) =
1, Nang (Sm , Snu,v ) = indirect.
u0 ,v0

where Ccha (Si ), Px,y (Si ) and D(Si ) denote color, position (17)
and disparity of LFSP Si respectively, which are averaged
all over the entire 2D patches, 4.3 LFHG segmentation algorithm
X 1 X After introducing the energy function, the complete process
Ccha (Si ) = u,v Ccha (p),
u,v
Nuv |Si | u,v of our algorithm is shown in Algorithm 1. The 4D light fields
p∈Si
and user scribbles are the input of our algorithm. First, we
1 X
D(Si ) = D(p), (12) calculate the disparity map for the central view image and
|Siu,v |
p∈Si
u,v
obtain 4D LFSPs using the algorithm proposed in [13]. After
X 1 X the LFSPs are initialized, the data term for all LFSPs can be
Px,y (Si ) = u,v Px,y (p),
Nuv |Si | u,v
computed in advance. At the same time, user inputs on the
u,v p∈Si
center view are propagated to other views via LFSPs and
u,v u,v several LFSPs are labeled as seeds. Next, the penalization
where p is a pixel belonging to Si , and |Si | denote the
u,v for assigning a segment label to a LFSP Si is calculated by
total pixel number in slice Si . Here we use CIELab color
space. Edata , meanwhile the influences of neighboring Si and Sn
are calculated by Esmooth . The complete energy function
E(L) is obtained according to Eq. (9). Finally we apply
4.2 Smoothness term
a graph-cut algorithm to minimize E(L) and output the
Spatial and angular neighbors, which act on the smoothness optimized segmentation result seg .
term, have great influences on final segmentation results.
Specifically, spatial neighbors ensure segmentation to be Input: The 4D light fields LF and input labels Inlabel
regular within the same view of a light field, while angular Output: The 4D light field segmentation results seg .
neighbors ensure segmentation consistency across views. D = DepthEstimation(LF )
To set boundary penalties, the smoothness term for n- LF SP = GetLF SP s(LF )
link {Sm , Sn } is defined as follows, for all LFSPs doP
1
P
Ccha (Si ) = u,v
Nuv |Si |
Ccha (p)
u,v
Esmooth (Sm , Sn ) = δ(LSm , LSn )A(Sm , Sn )B(Sm , Sn ), (13) u,v p∈Si
1
P
D(Si ) = S u,v D(p)
where δ(·) generates a penalty factor for two adjacent LFSPs | i | p∈S u,v
i
1
in both spatial and angular space with different segment
P P
Px,y (Si ) = u,v
Nuv |Si |
Px,y (p)
u,v u,v
labels, p∈S i
 end
0, if LSm = LSn for LFSP Si in the central view (u0 , v0 ) do
δ(LSm , LSn ) = (14)
1, otherwise. Edata (Si , lk ) =
arg min (EC (Si , Sj,L ) + λp EP (Si , Sj,L ) + λd ED (Si , Sj,L ))
When the labels are inconsistent, a penalty is added. j:LS =lk
j,L
B(Sm , Sn ) is a similarity distance between Sm and Sn , for all adjacent LFSPs Sn do
which is evaluated by, Esmooth (Si , Sn ) = δ(LSi , LSn )A(Si , Sn )B(Si , Sn )
 P end
cha k4Ccha k

k4Dk
B(Sm , Sn ) = exp − 2
− 2
, end
σC σD (15) E(L) =
P P
Edata (Si , lk ) + λs Esmooth (Si , Sn )
4Ccha = Ccha (Sm ) − Ccha (Sn ), 4D = D(Sm ) − D(Sn ), seg = argminL E(L)
2 2
Algorithm 1: Light field segmentation algorithm based on
where σC and σD are variances of color and disparity LFHG representation.
respectively. Since neighboring relations ensure the spatial
distance between two LFSPs is within a predefined thresh-
old, the position cue is not included here. Further, we adopt
A(Sm , Sn ) to measure the mutual influence of Sn and Sm 5 E XPERIMENTS
in terms of neighboring relations. Concretely, In this section, we compare our segmentation algorithm
u0 ,v0
A (Sm , Sn ) =Nspa (Sm , Snu0 ,v0 ) with state-of-art light field segmentation methods, including
1 X u0 ,v0 (16) GCMLA (globally consistent multi-label assignment) [15],
+ f (Sm , Snu,v ), SACS (spatial and angular consistence segmentation) [16]
Nuv
(u,v)6=(u0 ,v0 )
and RBGSS (ray-based graph structure segmentation) [17].
u0 ,v0
where f is the influence factor between slices Sm and Synthetic and real light fields are used to validate the
u,v
Sn . As mentioned above, for a pair of LFSPs, if their slices performance of our algorithm. For synthetic data, we use
in view (u0 , v0 ) are spatial neighbors, their slices in other the HCI dataset proposed in [35], which contains 4 light
views are very likely to be spatial neighbors too. Consider- fields with known depth, ground truth labeling and user
ing that the smoothness term is affected by direct angular input scribbles. For real data, we use two popular light
neighbors repeatedly across views, in order to maintain the field cameras, Lytro and Illum, to capture light fields of real

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 8

TABLE 2
Light field segmentation accuracy comparisons using the percentage of correctly segmented pixels.

Algorithm GCMLA RBGSS Ours w/o smooth GCMLA SACS Ours


Depth GT GT GT GT EST EST EST
Papillon 99.4 99.5 99.5 99.4 98.9 98.3 99.4
Buddha 99.1 99.1 99.2 96.3 98.8 96.4 99.0
Stilllife 99.2 99.2 99.1 98.2 98.9 97.7 98.9
Horses 99.1 99.1 99.3 99.1 98.3 95.9 98.9
Average 99.2 99.2 99.3 98.3 98.7 97.1 99.0

scenes. Light field decoding is implemented via LFToolbox TABLE 3


[36] and Lytro Power Tools respectively. For comparative Influence of LFSP size on the accuracy and efficiency of light filed
segmentation.
experiments on synthetic data, we implement the code of
GCMLA in cocolib 1 , and compare with publicly released
Super-pixel size 15 20 25 30
results of SACS and RBGSS. For comparative experiments Papillon 99.5 99.5 99.5 99.4
on real data, after sharing test data with the authors of Buddha 99.3 99.2 99.1 98.9
[16] [17], we get final segmentation results of SACS and Acc/%
Stilllife 99.2 99.1 99.0 98.8
preprocessing results of RBGSS. By utilizing our light field Horses 98.7 99.3 99.3 97.6
segmentation algorithm, we get final segmentation results Papillon 0.97 0.86 0.81 0.80
of RBGSS. Buddha 1.08 1.03 0.95 0.97
Time/s
Stilllife 1.06 0.97 0.95 0.99
Horses 0.92 0.85 0.83 0.84
5.1 Synthetic data
In this subsection, we compare our approach with three
state-of-the-art light field segmentation methods on the HCI
computer. It’s remarkable that our method greatly reduces
dataset. Figure 7 shows segmentation results of different
the data size and achieves a great promotion in computa-
methods. For better visualization, only the centre view im-
tional efficiency. Our current implementation is evaluated
ages are displayed. The top row shows input light fields, of
on a modern desktop computer with a 3.6 GHz i7 CPU. The
which the center view images are superimposed with users’
preprocessing step takes around 70s, in which central view
scribbles. The second row displays the segmentation ground
depth estimation and LFSP segmentation take about 5s and
truth. Figure 8 shows EPI images of our final segmentation
65s respectively. The average time for segmentation is about
results. The layout of each sub-image follows the same
1s (0.2s for optimization), which is close to the requirements
pattern of Figure 1. We can find that the segmentation
of real-time processing. The running time for preprocessing
results in the boundary and detailed areas generated by
and segmentation could be reduced by GPU acceleration.
GCMLA and SACS are often inaccurate. In contrast, our
Supposing there is a light field with 9×9 views and the
method provides more accurate segmentation in these areas,
LFSP size is 20 × 20 pixels, theoretically our method can
which is comparable with the results by RBGSS. Moreover,
simplify the data size by 81×400 times. Specifically, taking
the EPI images show that our algorithm could provide a
the Buddha as an example, RBGSS reduces data size from
coherent segmentation throughout the full 4D light field.
4.77×107 pixels to 8.19×105 pixels, while our method can
Quantitative comparisons are shown in Table 2. We adopt
reduce data size from 4.77×107 to 1.46×103 . Instead of exe-
both ground truth depth and estimated depth for all meth-
cuting any data reduction, GCMLA and SACS directly take
ods and additionally evaluate the importance of the smooth-
original pixels as input, which results in high complexities.
ness term for our energy function. Given inaccurate depth
estimation, our method delivers only slightly decreased We also evaluate the influence of LFSP size on segmen-
performance comparing to other methods. The full model of tation accuracy and algorithm efficiency, which is shown in
our method (with the smoothness term) outperforms all the Table 3. With the LFSP size changing from 15 to 25 pixels,
other evaluated techniques in most cases (except for Buddha) segmentation accuracy does not change significantly, neither
and achieves the highest average accuracy. dose the running time for segmentation. In our experiments,
Another significant advantage of our approach is its we set LFSP size to 20 pixels. A moderate LFSP size can not
computational efficiency. GCMLA only segment the center only maintain detailed information, but also simplify light
view image and the running time is quite long. SACS uses field representation efficiently.
5×5 views of 9×9 views to reduce the data size, however the As shown in Table 2, numerical analysis already demon-
algorithm complexity is still high. The segmentation time strate that the smoothness term plays an important role
of the above two algorithms is more than one minute. By in the final coherent segmentation. Figure 9 displays two
introducing ray-bundle and free rays, RBGSS reduces the examples and verifies the conclusion further. The left col-
running time to some extent and execute the optimization umn is noisy initial segmentation results without smoothing
step in 4 6s on an Intel Xeon E5640. For real data experiment, constraints, while the right column is optimized results with
the optimization algorithm takes about 6.5s on our desktop smoothing constraints. Optimized results are more accurate
for both central view image and 4D light fields, providing
1. https://sourceforge.net/p/cocolib/ the validity of proposed smoothing constraints.

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 9

Buddha Stilllife Papllion Horses


Input
GT labels
GCMLA[15]
SACS[16]
RBGSS[17]
Our results

Fig. 7. Light field segmentation results on HCI datasets.

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 10

Buddha Stilllife

Papllion Horses

Fig. 8. The EPI images of HCI datasets.

field dataset captured by Illum. The light fields Cherry, Per-


simmon, Park and Hide are provided by [37], while People and
Road are self-captured scenes. For each scene, from left to
right, we show the LFSP segmentation, user input, disparity
map, segmentation results in the central view image and in
the EPI image respectively. The EPI results show accuracy
and consistency of segmentation across all views. Figure 13
shows segmentation results of real data captured by Lytro.
Figure 14 shows comparisons of segmentation results
using different methods on several real scenes. Due to the
low quality of light field data, the disparity map is very
noisy, especially for Lytro data. SACS does not deal with
the light field segmentation problem from the perspective of
rays, so their segmentation results are not accurate enough.
In addition, SACS can only get the 5 × 5 segmentation
results, so the vertical resolution of EPI images is smaller.
RBGSS is sensitive to depth map quality since ray-bundles
and free rays are obtained using depth information. Our
method outperforms RGBSS in easily confused areas and the
areas with inaccurate depths. The last line of images shows
(a) (b) the EPI images of three comparison methods in horizontal
and vertical directions. The structure of our EPI images are
Fig. 9. Effectiveness of our smoothness term. (a) Initial segmentation
results. (b) Optimized segmentation results.
the most similar to that of the real scene, which shows that
our method can preserve the angular consistency of the light
field.
5.2 Real data Furthermore, in order to evaluate our algorithm under
more demanding conditions, such as wider baselines, higher
We use multiple sets of real light fields captured by Illum spatial-angular resolutions, we test the light fields provided
or Lytro to verify the effectiveness of our algorithm. The by [38]. The pixel offsets between adjacent view images are
disparity map for a light field captured by Illum can be generally greater than one pixel, and there are 100 view-
obtained directly from the camera. For light fields captured points horizontally. In addition, the resolution of each view
by Lytro, the disparity maps need to be estimated firstly [6]. image is 10 million levels. The results are shown in Figure
Figure 11 shows segmentation results on the real light 12, in which the segmentation and overlayed results prove

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 11

the effectiveness of our algorithm. However, the structures ACKNOWLEDGEMENT


of EPI images are less consistent comparing with other The work was supported by NSFC under Grants 61531014
scenes. and 61801396. We are grateful to H. Mihara and T. Funatomi
for their help on real scene light field segmentation com-
parisons. We thank M. Rizkallah’s team for their help in
5.3 Limitations preprocessing real data using the method proposed in [17].
There are some limitations of our algorithm. Current seg-
mentation results are affected by the quality of LFSP seg- R EFERENCES
mentation to some extent. For example, when some tiny
[1] Y. Boykov and M.-P. Jolly, “Interactive graph cuts for optimal
objects are similar to the background, or a non-Lambertian boundary & region segmentation of objects in n-d images,” in
object reflects the ray emitted from adjacent objects, in Proceedings Eighth IEEE International Conference on Computer Vision.
such cases, the performance of our method is likely to be ICCV 2001, vol. 1, 2001, pp. 105–112.
[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “icoseg:
degraded. Figure 10 shows above mentioned situations. Interactive co-segmentation with intelligent scribble guidance,”
In Figure 10(a), some tiny objects (such as the feet and in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE
wings of the bee) have similar or the same texture with Conference on. IEEE, 2010, pp. 3169–3176.
the background. It is difficult to distinguish them from [3] R. Ng, “Fourier slice photography,” in ACM Transactions on Graph-
ics (TOG), vol. 24, no. 3. ACM, 2005, pp. 735–744.
the background. In Figure 10(b), a smooth glass column [4] S. Wanner and B. Goldluecke, “Globally consistent depth labeling
reflects the color of Buddha, which makes the boundary of 4d light fields,” in Computer Vision and Pattern Recognition
areas very confusing for segmentation. These problems are (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp. 41–48.
[5] T.-C. Wang, A. A. Efros, and R. Ramamoorthi, “Occlusion-aware
still challenging for the segmentation task and not solved depth estimation using light-field cameras,” in Proceedings of the
well so far. IEEE International Conference on Computer Vision, 2015, pp. 3487–
3495.
[6] H. Zhu, Q. Wang, and J. Yu, “Occlusion-model guided anti-
occlusion depth estimation in light field,” IEEE Journal of Selected
Topics in Signal Processing, 2017.
[7] P. P. Srinivasan, M. W. Tao, R. Ng, and R. Ramamoorthi, “Oriented
light-field windows for scene flow,” in Proceedings of the IEEE
International Conference on Computer Vision, 2015, pp. 3496–3504.
[8] T. E. Bishop and P. Favaro, “The light field camera: Extended depth
of field, aliasing, and superresolution,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 34, no. 5, pp. 972–986, 2012.
[9] S. Wanner and B. Goldluecke, “Variational light field analysis for
disparity estimation and super-resolution,” IEEE transactions on
pattern analysis and machine intelligence, vol. 36, no. 3, pp. 606–619,
2014.
[10] T.-C. Wang, J.-Y. Zhu, E. Hiroaki, M. Chandraker, A. A. Efros, and
R. Ramamoorthi, “A 4d light-field dataset and cnn architectures
(a) tiny objects (b) non-Lambertain regions for material recognition,” in European Conference on Computer Vi-
sion. Springer, 2016, pp. 121–138.
Fig. 10. Limitations of our algorithm caused by (a) tiny objects and (b) [11] H. Zhang, K. Dana, and K. Nishino, “Friction from reflectance:
non-Lambertian regions. Each sub-image consists of a partial enlarged Deep reflectance codes for predicting physical surface properties
detail (upper left), LFSPs (upper right), ground truth labels (lower left) from one-shot in-field reflectance,” in European Conference on Com-
and segmentation results (lower right). puter Vision. Springer, 2016, pp. 808–824.
[12] J. Xue, H. Zhang, K. Dana, and K. Nishino, “Differential angular
imaging for material recognition,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2017, pp. 6940–6949.
[13] H. Zhu, Q. Zhang, and Q. Wang, “4d light field superpixel and
segmentation,” in 2017 IEEE Conference on Computer Vision and
6 C ONCLUSIONS Pattern Recognition (CVPR). IEEE, 2017, pp. 6709–6717.
[14] A. Jarabo, B. Masia, and D. Gutierrez, “Efficient propagation of
light field edits,” Proceedings of the SIACG, 2011.
In this paper, we propose a light field hypergraph represen- [15] S. Wanner, C. Straehle, and B. Goldluecke, “Globally consistent
tation and corresponding LFHG coarsening method. Based multi-label assignment on the ray space of 4d light fields,” in
on the LFHG theory, we utilize LFSPs to interactively seg- Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2013, pp. 1011–1018.
ment 4D light fields. We define spatial and angular neighbor [16] H. Mihara, T. Funatomi, K. Tanaka, H. Kubo, Y. Mukaigawa,
relations between different LFSPs. When coarsening the and H. Nagahara, “4d light field segmentation with spatial and
LFHG, the characteristics of LFSPs in both spatial and angular consistencies,” in Computational Photography (ICCP), 2016
IEEE International Conference on. IEEE, 2016, pp. 1–8.
angular domains are helpful to reduce the complexity of the [17] M. Hog, N. Sabater, and C. Guillemot, “Light field segmentation
LFHG representation. Then an energy function consisting of using a ray-based graph structure,” in European Conference on
a data term and a smoothness term is defined according to Computer Vision. Springer, 2016, pp. 35–50.
the LFHG structure. Finally a modified graph-cut algorithm [18] C. G. Matthieu Hog, Neus Sabater, “Super-rays for efficient light
field processing,” IEEE Journal of Selected Topics in Signal Processing,
is adopted to generate optimize segmentation results. Exper- vol. PP, no. 99, pp. 1–1, 2017.
imental results conducted on both synthetic and real scene [19] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy
data show the effectiveness and computational efficiency of minimization via graph cuts,” IEEE Transactions on pattern analysis
the proposed method. For future work, we tend to develop and machine intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
[20] G. Karypis and V. Kumar, “Multilevel k-way hypergraph parti-
the proposed light field segmentation technique to the light tioning,” in Design Automation Conference, 1999. Proceedings., 1999,
field video segmentation problem. pp. 343–348.

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 12

LFSP User input Disparity Segmentation EPI


Cherry
Persimmon
Park
Hide
People
Road

Fig. 11. Segmentation results on real light field data captured by Illum.

LFSP User input Disparity Segmentation EPI


Couch
Mansion
Statue

Fig. 12. Segmentation results on higher spatio-angular resolution light field data [38].

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 13

Totoro
Hydrant
Plant LFSP User input Disparity Segmentation EPI

Fig. 13. Segmentation results on real light field data captured by Lytro.

Hide Cherry Park People Totoro


LF with label
SACS[16]
RBGSS[17]
Our results

real
EPI

[16]
[17]
ours
horizontal vertical horizontal vertical horizontal vertical horizontal vertical horizontal vertical

Fig. 14. Comparisons of real data segmentation results. From top to bottom: central view images with user scribbles, segmentation results of
SACS, RBGSS and our method respectively, and the EPI images of three methods.

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 14

[21] Lytro, “Lytro redefines photography with light field cameras,” Dr. Xue Wang is now a research associate
http://www.lytro.com, 2011. in School of Computer Science, Northwestern
[22] Illum, “Lytro illum a new way to capture your stories,” Polytechnical University. She received B.S. in
http://www.illum.lytro.com, 2014. 2007 and Ph.D in 2017 from Northwestern
[23] P. Salembier and F. Marqués, “Region-based representations of Polytechnical University. From 2012 to 2014,
image and video: segmentation tools for multimedia services,” she studied in University of Pennsylvania as a
IEEE Transactions on circuits and systems for video technology, vol. 9, visiting joint Ph.D student financed by China
no. 8, pp. 1147–1169, 1999. Scholarship Council. Her research interests in-
[24] W.-N. Lie, “An efficient threshold-evaluation algorithm for image clude computer vision, computational photogra-
segmentation based on spatial graylevel co-occurrences,” Signal phy and machine learning. She currently focus-
Processing, vol. 33, no. 1, pp. 121–126, 1993. es building machines that understand the social
[25] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based signals and events that light field videos portray.
image segmentation,” International Journal of Computer Vision,
vol. 59, no. 2, pp. 167–181, 2004.
[26] G. Dong and M. Xie, “Color clustering and learning for image
segmentation based on neural networks,” IEEE Transactions on
Neural Networks, vol. 16, no. 4, p. 925, 2005.
[27] V. Kolmogorov and R. Zabin, “What energy functions can be
minimized via graph cuts?” IEEE transactions on pattern analysis
and machine intelligence, vol. 26, no. 2, pp. 147–159, 2004.
[28] Y. Boykov and V. Kolmogorov, “An experimental comparison of Prof. Qing Wang (M’05, SM’19) is now a Profes-
min-cut/max-flow algorithms for energy minimization in vision,” sor in the School of Computer Science, North-
IEEE transactions on pattern analysis and machine intelligence, vol. 26, western Polytechnical University. He graduated
no. 9, pp. 1124–1137, 2004. from the Department of Mathematics, Peking
[29] M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings University, in 1991. He then joined Northwest-
of the 23rd annual conference on Computer graphics and interactive ern Polytechnical University. In 2000 he received
techniques. ACM, 1996, pp. 31–42. PhD in the Department of Computer Science
[30] H. Ao, Y. Zhang, A. Jarabo, B. Masia, Y. Liu, D. Gutierrez, and and Engineering, Northwestern Polytechnical U-
Q. Dai, “Light field editing based on reparameterization,” in Pacific niversity. In 2006, he was awarded as outstand-
Rim Conference on Multimedia, 2015. ing talent program of new century by Ministry of
[31] E. Garces, J. I. Echevarria, Z. Wen, H. Wu, and D. Gutierrez, Education, China. He is now a Senior Member
“Intrinsic light field images,” in Computer Graphics Forum, 2017. of IEEE and a Member of ACM. He is also a Senior Member of China
[32] A. Jarabo, B. Masia, A. Bousseau, F. Pellacini, and D. Gutierrez, Computer Federation (CCF). He worked as research scientist in the
“How do people edit light fields?” ACM Trans. Graph., vol. 33, Department of Electronic and Information Engineering, the Hong Kong
no. 4, pp. 146–1, 2014. Polytechnic University from 1999 to 2002. He also worked as a visit-
[33] M. Hog, N. Sabater, and C. Guillemot, “Dynamic super-rays for ing scholar in the School of Information Technology, The University of
efficient light field video processing,” in BMVC 2018 - 29th British Sydney, Australia, in 2003 and 2004. In 2009 and 2012, he visited Hu-
Machine Vision Conference, Sep. 2018, pp. 1–12. man Computer Interaction Institute, Carnegie Mellon University, for six
[34] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, months and Department of Computer Science, University of Delaware,
“Slic superpixels compared to state-of-the-art superpixel method- for one month. Prof. Wang’s research interests include computer vision
s,” IEEE Transactions on Pattern Analysis and Machine Intelligence, and computational photography, such as 3D reconstruction, object de-
vol. 34, no. 11, pp. 2274–2282, 2012. tection, tracking and recognition, light field imaging and processing. He
[35] S. Wanner, S. Meister, and B. Goldluecke, “Datasets and bench- has published more than 100 papers in the international journals and
marks for densely sampled 4d light fields.” in VMV, 2013, pp. conferences.
225–226.
[36] D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding,
calibration and rectification for lenselet-based plenoptic cameras,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2013, pp. 1027–1034.
[37] A. S. Raj, M. Lowney, and R. Shah, “Light-field database creation
and depth estimation,” Dept. Computer Science, Stanford University,
Technical Report, 2016.
[38] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, Prof. Jingyi Yu is Director of Virtual Reality
“Scene reconstruction from high spatio-angular resolution light and Visual Computing Center in the School of
fields,” in ACM Transactions on Graphics (TOG), vol. 32, no. 4, 2013, Information Science and Technology at Shang-
p. 1. haiTech University. He received B.S. from Cal-
tech in 2000 and Ph.D. from MIT in 2005.
He is also a full professor at the University of
Delaware. His research interests span a range of
topics in computer vision and computer graphic-
s, especially on computational photography and
non-conventional optics and camera designs. He
has published over 100 papers at highly refereed
conferences and journals including over 50 papers at the premiere con-
Xianqiang Lv received the B.E. degree in ferences and journals CVPR/ICCV/ECCV/TPAMI. He has been granted
the School of Computer Science, Northwestern 10 US patents. His research has been generously supported by the Na-
Polytechnical University, in 2017. He is now a tional Science Foundation (NSF), the National Institute of Health (NIH),
postgraduate at School of Computer Science, the Army Research Office (ARO), and the Air Force Office of Scientific
Northwestern Polytechnical University. His re- Research (AFOSR). He is a recipient of the NSF CAREER Award,
search interests include computational photog- the AFOSR YIP Award, and the Outstanding Junior Faculty Award at
raphy, multiview light field computing and pro- the University of Delaware. He has previously served as general chair,
cessing. program chair, and area chair of many international conferences such
as ICCV, ICCP and NIPS. He is currently an Associate Editor of IEEE
TPAMI, Elsevier CVIU, Springer TVCJ and Springer MVA.

1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.

You might also like