Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 1
Abstract—Efficient and accurate segmentation of full 4D light fields is an important task in computer vision and computer graphics.
The massive volume and the redundancy of light fields make it an open challenge. In this paper, we propose a novel light field
hypergraph (LFHG) representation using the light field super-pixel (LFSP) for interactive light field segmentation. The LFSPs not only
maintain the light field spatio-angular consistency, but also greatly contribute to the hypergraph coarsening. These advantages make
LFSPs useful to improve segmentation performance. Based on the LFHG representation, we present an efficient light field
segmentation algorithm via graph-cut optimization. Experimental results on both synthetic and real scene data demonstrate that our
method outperforms state-of-the-art methods on the light field segmentation task with respect to both accuracy and efficiency.
Index Terms—Hypergraph representation, Light field segmentation, Light field super-pixel, Graph-cut optimization.
1 I NTRODUCTION
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 2
order relations. As a natural high-order relation descriptor, and 4D light field coordinates, and will degrade for complex
the hypergraph is particularly appropriate for representing scenes.
complex relations within a light field. In order to segment light fields accurately, some rep-
Considering the massive input volume of a light field, if resentative works have been proposed. Wanner et al. [15]
the vertices are used to denote pixels, the whole hypergraph introduce an effective disparity cue for 4D light field seg-
would be very complex. The coarsening strategy is com- mentation for the first time. They first use a set of input
monly adopted to generate a simplified graph by collapsing scribbles on the central view to train a random forest
appropriate vertices. For a full parallax light field, its spatio- classifier based on disparity and appearance cues. Then,
angular consistency is a unique property. Since LFSPs are a global consistency term is optimized using all views si-
capable of preserving consistencies between different views multaneously. However, the segmentation is not performed
[13], they can scale down the data volume greatly in both on the full 4D light field (only the central view image
spatial and angular domains. Angularly, a LFSP contains is segmented) and the implementation is time-consuming.
several 2D slices by fixing the angular dimension. Spatially, Mihara et al. [16] define two neighboring ray types (spatial
the 2D slices of a LFSP contain a number of corresponding and angular) in light fields and design a learning-based
pixels by fixing the spatial dimension. likelihood function. They build a graph in 4D space and use
The experiments are conducted on both synthetic and a graph-cut algorithm to get an optimized segmentation.
real light field data captured by Lytro [21] and Illum [22]. Due to the high complexity of this method, only partial
Experimental results show that our approach is competitive view-points (5 × 5) are used in the experiments. In order
with state-of-the-art techniques with a significantly lower to improve efficiency, Hog et al. [17] propose a novel graph
computational complexity. The main contributions are sum- representation for interactive light field segmentation using
marized as follows, Markov Random Field (MRF). By exploiting the redundancy
(1) A LFHG representation is constructed to model the in the ray space, the ray-based graph structure reduces the
high-order relations among the rays passing through differ- graph size and thus decreases the running time of the MRF-
ent views. By reformulating the correlation in the 4D ray based optimization. However, the proposed free-rays and
space, instead of traditional 2D pixel space, our LFHG rep- ray bundles are greatly influenced by depth estimation. In
resentation proves to be efficient for light field segmentation addition, the requirement for depth maps for all views limits
and other low-level vision tasks. its applications in practical problems.
(2) A novel light field segmentation algorithm using As an important light field preprocessing method, light
LFHG and LFSP is proposed. In view of the ray sampling field super-pixels/super-rays are proposed by [13], [18],
nature of LFSP, we define the spatial and angular neighbors [33]. Different from [18] which adopt a graph-cut optimiza-
of LFSP and propose a LFHG coarsening strategy. Then tion, we try to exploit the high-order relations between light
the light field segmentation can be solved via a traditional rays and build a hypergraph representation with LFSPs [13],
graph-cut optimization. which enables a generalized light field processing pipeline.
Also, a different neighboring system is defined. We adopt
a distance-based neighborhood definition to take isolated
2 R ELATED W ORK LFSPs into account, which can provide more flexibility to
As a fundamental problem in computer vision, image seg- adjust the range and weights of neighborhoods. Our algo-
mentation has been extensively studied in recent decades. rithm segments the 4D light fields as a whole to maintain
Various solutions are proposed to tackle the problem, such the consistency across views and meanwhile improves the
as region-based method [23], threshold-based method [24], efficiency significantly.
graph-based method [25], learning-based method [26], etc.
Among these, graph-cut optimization has been proven to be
effective for the multi-label segmentation problem [19], [25], 3 L IGHT F IELD H YPERGRAPH
[27]. They usually employ a max-flow/min-cut optimization
[28]. Although these methods perform well in certain con- The theory of graph-cut optimization has been successfully
ditions, however, due to the image uncertainties and spatial applied to the image segmentation problem. Generally, a
ambiguities in defocus and boundary areas, establishing a weighted graph is constructed by taking each pixel as a
thorough segmentation solution is still an open challenge. node and connecting each pair of pixels by an edge. The
Light fields [29] record both angular and spatial infor- weight on an edge connecting two nodes is computed to be
mation of the scene through 4D data. To keep consistency a measurement of the similarity between these two nodes,
between views, propagation methods have been applied to usually using the brightness values of the pixels and their
propagate sparse user edits to the full light field [14], [30], spatial locations. As for light field segmentation, since a light
[31], [32]. To enable efficient calculations of these method- field can be formulated into a multi-view representation, to
s, of which the complexity is linear with the input data get a consistent segmentation on the full 4D light field, the
size, different strategies are utilized. Ao et al. [30] exploit relations between two scene points from different views are
the epipolar geometry of light fields and propose a repa- very helpful to find their complementation. However, the
rameterization method to change the epipolar geometry. relations among multi-modal features from different aspect-
Besides, the downsampling and upsamping techniques are s/views are too complex to be described using the binary
often used in propagation methods [14], [30], [31]. However, correlation only, such as a graph. Therefore, it is necessary
their performance are greatly influenced by the clustering to find a new representation for high-order relations among
operation in a 7-dimensional space including appearances multi-modal features.
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 3
hyperedge
l1
……
Scribble LFSPs
lk
……
Input light field LFHG structure Optimized segmentation results
Disparity Initial segmentation
Fig. 2. The pipeline of the proposed LFSP based light field segmentation. (a) The 3 × 3 representation of an input light field. (b) Data preprocessing
and intermediate results, including user scribbles, disparity estimation, LFSP segmentation by [13] and initial segmentation results respectively. (c)
The light field hypergraph structure based on LFSPs. (d) Optimized light field segmentation results.
…… ……
segmentation on the full 4D light field is achieved using an
…… ……
energy minimization method. The final global optimization
Labeled parts Non-labeled parts
step enhances per-image segmentation results by removing
noises that are not supported by multiple views, thus the
resulting segmentations are globally consistent throughout Fig. 3. The structure of our LFHG representation. The set of vertices
the light field. For better visualization, we use the original consists of labeled parts, non-labeled parts and labels. All the lines here
are actually hyperedges, of which each one connects more than two
view image overlayed with multi-colored segments together vertices.
with EPI images for several ROIs to illustrate the optimized
segmentation results. The pipeline of the proposed algorith-
m is shown in Figure 2. Different from graph edges which are pairs of nodes, hyper-
In this section, we first introduce the definition of our edges contain an arbitrary number of nodes. Therefore, as a
LFHG representation. Then, a LFHG coarsening method generalization of classic graph, the hypergraph is a nature
based on LFSPs is presented. Finally, we formulate the light higher-order relations descriptor.
field segmentation as an energy minimization problem. Mathematically, we define light field segmentation as a
hypergraph partitioning problem. As shown in Figure 3,
given user scribbles, the vertices and hyperedges can de
3.1 Hypergraph Representation of Light Field
defined as,
In a light field LF , a light ray could be described by two- ( ( ) ( ) )
parallel-plane (TPP) model [29]. Suppose the uv plane is the V =
[ [
pu,v ,
[ [
pu,v ,
[
{lk } ,
i,N j,L
view plane and the xy plane is the image plane. A light ray i u,v j u,v k
r emitted from a scene point can be uniquely determined ( ) ( )
by r = LF (u, v, x, y), which corresponds to an indexed
[ [ [ [
E= pu,v
i,N , lk , pu,v
j,L , lk ,
pixel of the raw image inside the camera. Another way (1)
i,k u,v j,k u,v
of representing light fields is the multi-view representation, (
)
which divides the 4D light filed into different views (u, v) [ [ [
and corresponding images (x, y). For a better understand- pu,v
m , pu,v
n ,
u∗ ,v ∗ u∗ ,v ∗
m,n:∃(u∗ ,v ∗ ),pn ∈N (pm ) u,v u,v
ing, in this paper we will discuss the light field hypergraph
representation in a multi-view way. u,v
where pi,N is the i-th non -labeled part in S view (u, v)
S, and
A hypergraph [20] is a generalization of a graph in
pu,v
j,L the j -th labeled part in view (u, v) . pu,v
i,N and pu,v
j,L
which an edge can join any number of vertices. Formally, u,v u,v
a hypergraph H = (V, E) is defined as a set of vertices V , denote the sets of corresponding parts across views in a light
and a set of non-empty subsets of V called hyperedges E . field. We use the tags L/N to distinguish between labeled
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 4
parts and non-labeled parts, lk is the k -th type of segment field is challenging. To overcome these, we build the LFHG
label. Note that, the variable p here could be a single light structure based on LFSPs, which could eliminate defocus
ray or a set of light rays emitted from a small continuous and occlusion ambiguity essentially for segmentation [13],
3D surface with similar photometric properties, which cor- and try to simplify the hypergraph partitioning problem via
responds to a single pixel or a super-pixel respectively. a coarsening method.
The set V is composed of three types of vertices: labeled
parts, non-labeled parts, and segment labels. The set E con- 3.2.1 Definition of LFSP
sists of two types of hyperedges. Similar to the definitions Comparing to a super-pixel, a small patch consisting of
of n-links (neighborhood links) and t-links (terminal links) pixels with similar appearance, brightness, or texture in
introduced in traditional graph-based
S u,v segmentation S u,v meth- a 2D image, which may lose geometric structure of the
ods [19], [28], hyperedges { pi,N , lk } and { pj,L , lk } scene, the LFSP consists of a set of light rays emitted from
u,v u,v
a small continuous 3D surface with similar photometric
are S u,v with the same cardinality Nuv + 1, while
S both t-links properties and captured by a light field camera. The 3D
{ pu,v
m , pn } is a n-link of which the cardinality is
u,v u,v distribution of light rays is influenced by the scene structure.
2 × Nuv , where Nuv is the angularSsampling S number of a Thus, conversely, the structure will be implicitly encoded
light field. For any possible pair ( pu,v
m , pu,v
n ) , if there in the LFSP. Mathematically, let LF (u, v, x, y) be a light
u,v u,v
∗ ∗ field and R be a small continuous 3D surface with similar
exists a view (u∗ , v ∗ ) in which pun ,v is a neighbor of
∗ ∗ ∗ ∗ ∗ ∗ photometric characteristics, a LFSP SR (u, v, x, y) could be
pum ,v
, i.e.Spun ,v ∈ N (pum ,v ), there will exist a hyperedge
defined as follows [13],
{ pm , pu,v
S u,v
n }. The neighboring system is decided ac-
u,v u,v |R|
[
cording to the specific definition of the part p. Thus the SR (u, v, x, y) = LF (uPi , vPi , xPi , yPi ), (2)
rank of our hypergraph is 2 × Nuv , which is the maximum i=1
cardinality of any of the edges in the hypergraph. where LF (uPi , vPi , xPi , yPi ) is a bundle of rays emitted
The multiview representation of a LFHG is shown in Fig- from a 3D point Pi on the surface R, and | · | denotes the
u,v
ure 4. The Red patche pj,L is one labeled part in view (u, v). number of elements in the set.
The blue, yellow and gray patches represent non-labeled Obviously, the 4D LFSP consists of angular dimensions
parts in different views. Green dotted lines connecting
S u,v yel- (u, v) and spatial dimensions (x, y). For a specific view
low patches and label lk make up a hyperedge { pi,N , lk }. u∗ ,v ∗
u,v (u∗ , v ∗ ), the patch SR is a 2D slice of SR . Due to the
Orange dotted lines connectingSyellowSpatches and blue Lambertian assumption, all the slices from different views
u,v
patches make up a hyperedge { pi,N , pu,v
n }. Neighbor- should be similar. This leads to a refocus-invariant metric
u,v u,v for LFSP segmentation, named LFSP self similarity [13]. It
ing relations between different patches are important for the is remarkable that the consistency in angular dimension is
light field segmentation problem. maintained by LFSPs. By fixing (u, x) to (u∗ , x∗ ), or (v, y) to
(v ∗ , y ∗ ), we can obtain horizontal or vertical EPIs of a light
l1 …… lk …… field. A LFSP is a combination of adjacent EPI lines with
similar gradients and appearances.
Hyperedge
3.2.2 Neighboring Relation of LFSPs
To build the LFHG structure based on LFSPs, the neigh-
borhood system of LFSPs still needs to be discussed. Dif-
ferent from the fact that a pixel in a 2D image only has
…… spatial neighbors, for a pixel pr corresponding to a light ray
r = LF (u, v, x, y) in a light field, there are two types of
neighbors, spatial neighbors and angular neighbors, which
……
……
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 5
it is not necessary for neighboring LFSPs to share common slices in view (u, v) might be spatial neighbors. From ex-
frontiers, as shown in Figure 5. The distance-based neigh- periments we observed that direct angular neighbors and
bor system is the key to correctly dealing with isolated indirect angular neighbors have different attributions. Our
LFSPs. Second, our neighbor system enables more flexible analysis in the energy function section for optimization will
operations. For example, we can define a wider range of further reveals that we can analyze the types of angular
neighbors, or set different weights for neighboring LFSPs at neighbors to distinguish their impacts on segmentation.
different distances. Combining spatial and angular relationship of LFSPs
u,v
Let Sm and Snu,v be the 2D slices of LFSPs Sm and Sn in together, we can obtain the neighborhood set Ω for all LFSPs
view (u, v) respectively, their spatial relation can be defined in a light field:
as, ∗ ∗ ∗
,v ∗
u,v
Ω = {(Sm , Sn )|∃(u∗ , v ∗ ), Nspa (Sm
u ,v
, Snu ) = 1}. (6)
) − Px,y (Snu,v )k ≤ T
u,v 1, kPx,y (Sm
Nspa (Sm , Snu,v ) =
0, otherwise,
The vertex in Eq. (1) denotes a single light ray. To
(4)
u,v simplify the LFHG representation, here we adopt a LFSP
where Px,y (Sm ) and Px,y (Snu,v ) denote the center positions
u,v u,v as a vertex, which are composed of a beam of light rays
of Sm and Sn respectively, k·k is the Euclidean distance,
emitted from the scene. Therefore we can rewrite Eq. (1) as
and T is√a predefined threshold. In our experiments, we
set T = 3M to generate a moderate number of spatial (
[
(
[
)
[
(
[
)
[
)
u,v u,v
neighbors [13], [18], [34], where M is the super-pixel size. In V = Si,N , Sj,L , {lk } ,
section 5.2, we will discuss the influence of M on segmen- (
i u,v j u,v k
tation accuracy and efficiency. [ [
)
[
(
[
)
u,v u,v
For angular relationship between LFSPs, we define two E= Si,N , lk , Sj,L , lk , (7)
types of angular neighbors, which are direct angular neigh-
i,k u,v j,k u,v
u∗ ,v ∗ )
bors and indirect angular neighbors. Supposing Sm and (
u∗ ,v ∗
[ [ [
∗ ∗ u,v
Snu,v
Sn are spatial neighbors in view (u , v ) according to Sm , .
u,v
Eq. (4), if Sm and Snu,v are also spatial neighbors in view
m,n:(Sm ,Sn )∈Ω u,v u,v
∗ ∗
(u, v), Sm and Snu ,v are considered as direct angular
u,v
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 6
……
[
Ê = {Si,N , lk }, {Sj,L , lk }, (8) S1 S2 S3
i,k j:Sj,L ∈Ck lk
n-links
……
[
{Sm , Sn } . t-links
m,n:(Sm ,Sn )∈Ω
l1 …… lk ……
The following table gives the distances/costs of hyper-
edges Ê . Edata and Esmooth will be discussed in the next
section. Note that for each n-link {Sm , Sn }, we need to Fig. 6. An example of the LFHG structure in a multi-view representation
figure out the distribution of all the neighboring relations (left) and the LFHG coarsening strategy (right). Instead of original view
u,v
between Sm and Snu,v , which includes the number of spa- images, enlarged detail views of several surrounding LFSPs are shown.
tial neighbors within the same view, the number of direct an-
gular neighbors and the number of indirect angular neigh-
bors across different views. Since the binary neighboring The term Edata calculates the individual penalty for assign-
relations established on distance metric is symmetric, when ing lk to Si . The term Esmooth measures the penalty for a
the center view is chosen as the reference view, we only discontinuity between adjacent LFSPs in both spatial and
u0 ,v0
consider relations between Sm and Snu,v . Note that the angular space, i.e. the boundary properties of segmentation
total number of above mentioned three types of neighbors L. The coefficient λs controls the balance of the smoothing
should take an integer value in the range [1, Nuv ]. term to avoid over-smoothing. We set λs = 10 in most ex-
periments. The α-expansion algorithm [19] is used to solve
TABLE 1 Eq. (9). More details of the data term and smoothness term
Hyperedge distances/costs of the LFHG after coarsening. will be discussed in the following subsections separately.
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 7
are defined as follows, balance between direct and indirect angular neighbors, we
X assign a smaller value to the penalty for direct angular
EC (Si , Sj,L ) = kCcha (Si ) − Ccha (Sj,L )k,
neighbors,
cha
(11) u0 ,v0
EP (Si , Sj,L ) = kPx,y (Si ) − Px,y (Sj,L )k , , Snu,v ) = direct
u0 ,v0 0.5, Nang (Sm
ED (Si , Sj,L ) = kD(Si ) − D(Sj,L )k , f (Sm , Snu,v ) =
1, Nang (Sm , Snu,v ) = indirect.
u0 ,v0
where Ccha (Si ), Px,y (Si ) and D(Si ) denote color, position (17)
and disparity of LFSP Si respectively, which are averaged
all over the entire 2D patches, 4.3 LFHG segmentation algorithm
X 1 X After introducing the energy function, the complete process
Ccha (Si ) = u,v Ccha (p),
u,v
Nuv |Si | u,v of our algorithm is shown in Algorithm 1. The 4D light fields
p∈Si
and user scribbles are the input of our algorithm. First, we
1 X
D(Si ) = D(p), (12) calculate the disparity map for the central view image and
|Siu,v |
p∈Si
u,v
obtain 4D LFSPs using the algorithm proposed in [13]. After
X 1 X the LFSPs are initialized, the data term for all LFSPs can be
Px,y (Si ) = u,v Px,y (p),
Nuv |Si | u,v
computed in advance. At the same time, user inputs on the
u,v p∈Si
center view are propagated to other views via LFSPs and
u,v u,v several LFSPs are labeled as seeds. Next, the penalization
where p is a pixel belonging to Si , and |Si | denote the
u,v for assigning a segment label to a LFSP Si is calculated by
total pixel number in slice Si . Here we use CIELab color
space. Edata , meanwhile the influences of neighboring Si and Sn
are calculated by Esmooth . The complete energy function
E(L) is obtained according to Eq. (9). Finally we apply
4.2 Smoothness term
a graph-cut algorithm to minimize E(L) and output the
Spatial and angular neighbors, which act on the smoothness optimized segmentation result seg .
term, have great influences on final segmentation results.
Specifically, spatial neighbors ensure segmentation to be Input: The 4D light fields LF and input labels Inlabel
regular within the same view of a light field, while angular Output: The 4D light field segmentation results seg .
neighbors ensure segmentation consistency across views. D = DepthEstimation(LF )
To set boundary penalties, the smoothness term for n- LF SP = GetLF SP s(LF )
link {Sm , Sn } is defined as follows, for all LFSPs doP
1
P
Ccha (Si ) = u,v
Nuv |Si |
Ccha (p)
u,v
Esmooth (Sm , Sn ) = δ(LSm , LSn )A(Sm , Sn )B(Sm , Sn ), (13) u,v p∈Si
1
P
D(Si ) = S u,v D(p)
where δ(·) generates a penalty factor for two adjacent LFSPs | i | p∈S u,v
i
1
in both spatial and angular space with different segment
P P
Px,y (Si ) = u,v
Nuv |Si |
Px,y (p)
u,v u,v
labels, p∈S i
end
0, if LSm = LSn for LFSP Si in the central view (u0 , v0 ) do
δ(LSm , LSn ) = (14)
1, otherwise. Edata (Si , lk ) =
arg min (EC (Si , Sj,L ) + λp EP (Si , Sj,L ) + λd ED (Si , Sj,L ))
When the labels are inconsistent, a penalty is added. j:LS =lk
j,L
B(Sm , Sn ) is a similarity distance between Sm and Sn , for all adjacent LFSPs Sn do
which is evaluated by, Esmooth (Si , Sn ) = δ(LSi , LSn )A(Si , Sn )B(Si , Sn )
P end
cha k4Ccha k
k4Dk
B(Sm , Sn ) = exp − 2
− 2
, end
σC σD (15) E(L) =
P P
Edata (Si , lk ) + λs Esmooth (Si , Sn )
4Ccha = Ccha (Sm ) − Ccha (Sn ), 4D = D(Sm ) − D(Sn ), seg = argminL E(L)
2 2
Algorithm 1: Light field segmentation algorithm based on
where σC and σD are variances of color and disparity LFHG representation.
respectively. Since neighboring relations ensure the spatial
distance between two LFSPs is within a predefined thresh-
old, the position cue is not included here. Further, we adopt
A(Sm , Sn ) to measure the mutual influence of Sn and Sm 5 E XPERIMENTS
in terms of neighboring relations. Concretely, In this section, we compare our segmentation algorithm
u0 ,v0
A (Sm , Sn ) =Nspa (Sm , Snu0 ,v0 ) with state-of-art light field segmentation methods, including
1 X u0 ,v0 (16) GCMLA (globally consistent multi-label assignment) [15],
+ f (Sm , Snu,v ), SACS (spatial and angular consistence segmentation) [16]
Nuv
(u,v)6=(u0 ,v0 )
and RBGSS (ray-based graph structure segmentation) [17].
u0 ,v0
where f is the influence factor between slices Sm and Synthetic and real light fields are used to validate the
u,v
Sn . As mentioned above, for a pair of LFSPs, if their slices performance of our algorithm. For synthetic data, we use
in view (u0 , v0 ) are spatial neighbors, their slices in other the HCI dataset proposed in [35], which contains 4 light
views are very likely to be spatial neighbors too. Consider- fields with known depth, ground truth labeling and user
ing that the smoothness term is affected by direct angular input scribbles. For real data, we use two popular light
neighbors repeatedly across views, in order to maintain the field cameras, Lytro and Illum, to capture light fields of real
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 8
TABLE 2
Light field segmentation accuracy comparisons using the percentage of correctly segmented pixels.
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 9
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 10
Buddha Stilllife
Papllion Horses
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 11
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 12
Fig. 11. Segmentation results on real light field data captured by Illum.
Fig. 12. Segmentation results on higher spatio-angular resolution light field data [38].
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 13
Totoro
Hydrant
Plant LFSP User input Disparity Segmentation EPI
Fig. 13. Segmentation results on real light field data captured by Lytro.
real
EPI
[16]
[17]
ours
horizontal vertical horizontal vertical horizontal vertical horizontal vertical horizontal vertical
Fig. 14. Comparisons of real data segmentation results. From top to bottom: central view images with user scribbles, segmentation results of
SACS, RBGSS and our method respectively, and the EPI images of three methods.
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2020.2982158, IEEE
Transactions on Visualization and Computer Graphics
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 14
[21] Lytro, “Lytro redefines photography with light field cameras,” Dr. Xue Wang is now a research associate
http://www.lytro.com, 2011. in School of Computer Science, Northwestern
[22] Illum, “Lytro illum a new way to capture your stories,” Polytechnical University. She received B.S. in
http://www.illum.lytro.com, 2014. 2007 and Ph.D in 2017 from Northwestern
[23] P. Salembier and F. Marqués, “Region-based representations of Polytechnical University. From 2012 to 2014,
image and video: segmentation tools for multimedia services,” she studied in University of Pennsylvania as a
IEEE Transactions on circuits and systems for video technology, vol. 9, visiting joint Ph.D student financed by China
no. 8, pp. 1147–1169, 1999. Scholarship Council. Her research interests in-
[24] W.-N. Lie, “An efficient threshold-evaluation algorithm for image clude computer vision, computational photogra-
segmentation based on spatial graylevel co-occurrences,” Signal phy and machine learning. She currently focus-
Processing, vol. 33, no. 1, pp. 121–126, 1993. es building machines that understand the social
[25] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based signals and events that light field videos portray.
image segmentation,” International Journal of Computer Vision,
vol. 59, no. 2, pp. 167–181, 2004.
[26] G. Dong and M. Xie, “Color clustering and learning for image
segmentation based on neural networks,” IEEE Transactions on
Neural Networks, vol. 16, no. 4, p. 925, 2005.
[27] V. Kolmogorov and R. Zabin, “What energy functions can be
minimized via graph cuts?” IEEE transactions on pattern analysis
and machine intelligence, vol. 26, no. 2, pp. 147–159, 2004.
[28] Y. Boykov and V. Kolmogorov, “An experimental comparison of Prof. Qing Wang (M’05, SM’19) is now a Profes-
min-cut/max-flow algorithms for energy minimization in vision,” sor in the School of Computer Science, North-
IEEE transactions on pattern analysis and machine intelligence, vol. 26, western Polytechnical University. He graduated
no. 9, pp. 1124–1137, 2004. from the Department of Mathematics, Peking
[29] M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings University, in 1991. He then joined Northwest-
of the 23rd annual conference on Computer graphics and interactive ern Polytechnical University. In 2000 he received
techniques. ACM, 1996, pp. 31–42. PhD in the Department of Computer Science
[30] H. Ao, Y. Zhang, A. Jarabo, B. Masia, Y. Liu, D. Gutierrez, and and Engineering, Northwestern Polytechnical U-
Q. Dai, “Light field editing based on reparameterization,” in Pacific niversity. In 2006, he was awarded as outstand-
Rim Conference on Multimedia, 2015. ing talent program of new century by Ministry of
[31] E. Garces, J. I. Echevarria, Z. Wen, H. Wu, and D. Gutierrez, Education, China. He is now a Senior Member
“Intrinsic light field images,” in Computer Graphics Forum, 2017. of IEEE and a Member of ACM. He is also a Senior Member of China
[32] A. Jarabo, B. Masia, A. Bousseau, F. Pellacini, and D. Gutierrez, Computer Federation (CCF). He worked as research scientist in the
“How do people edit light fields?” ACM Trans. Graph., vol. 33, Department of Electronic and Information Engineering, the Hong Kong
no. 4, pp. 146–1, 2014. Polytechnic University from 1999 to 2002. He also worked as a visit-
[33] M. Hog, N. Sabater, and C. Guillemot, “Dynamic super-rays for ing scholar in the School of Information Technology, The University of
efficient light field video processing,” in BMVC 2018 - 29th British Sydney, Australia, in 2003 and 2004. In 2009 and 2012, he visited Hu-
Machine Vision Conference, Sep. 2018, pp. 1–12. man Computer Interaction Institute, Carnegie Mellon University, for six
[34] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, months and Department of Computer Science, University of Delaware,
“Slic superpixels compared to state-of-the-art superpixel method- for one month. Prof. Wang’s research interests include computer vision
s,” IEEE Transactions on Pattern Analysis and Machine Intelligence, and computational photography, such as 3D reconstruction, object de-
vol. 34, no. 11, pp. 2274–2282, 2012. tection, tracking and recognition, light field imaging and processing. He
[35] S. Wanner, S. Meister, and B. Goldluecke, “Datasets and bench- has published more than 100 papers in the international journals and
marks for densely sampled 4d light fields.” in VMV, 2013, pp. conferences.
225–226.
[36] D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding,
calibration and rectification for lenselet-based plenoptic cameras,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2013, pp. 1027–1034.
[37] A. S. Raj, M. Lowney, and R. Shah, “Light-field database creation
and depth estimation,” Dept. Computer Science, Stanford University,
Technical Report, 2016.
[38] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, Prof. Jingyi Yu is Director of Virtual Reality
“Scene reconstruction from high spatio-angular resolution light and Visual Computing Center in the School of
fields,” in ACM Transactions on Graphics (TOG), vol. 32, no. 4, 2013, Information Science and Technology at Shang-
p. 1. haiTech University. He received B.S. from Cal-
tech in 2000 and Ph.D. from MIT in 2005.
He is also a full professor at the University of
Delaware. His research interests span a range of
topics in computer vision and computer graphic-
s, especially on computational photography and
non-conventional optics and camera designs. He
has published over 100 papers at highly refereed
conferences and journals including over 50 papers at the premiere con-
Xianqiang Lv received the B.E. degree in ferences and journals CVPR/ICCV/ECCV/TPAMI. He has been granted
the School of Computer Science, Northwestern 10 US patents. His research has been generously supported by the Na-
Polytechnical University, in 2017. He is now a tional Science Foundation (NSF), the National Institute of Health (NIH),
postgraduate at School of Computer Science, the Army Research Office (ARO), and the Air Force Office of Scientific
Northwestern Polytechnical University. His re- Research (AFOSR). He is a recipient of the NSF CAREER Award,
search interests include computational photog- the AFOSR YIP Award, and the Outstanding Junior Faculty Award at
raphy, multiview light field computing and pro- the University of Delaware. He has previously served as general chair,
cessing. program chair, and area chair of many international conferences such
as ICCV, ICCP and NIPS. He is currently an Associate Editor of IEEE
TPAMI, Elsevier CVIU, Springer TVCJ and Springer MVA.
1077-2626 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on May 05,2020 at 04:24:40 UTC from IEEE Xplore. Restrictions apply.