You are on page 1of 6

A Graph Min-Cut Approach to Interactive Image

Prabhanjan Ananth, Saikrishna Badrinarayanan, Aashith Kamath
Department of Computer Science,
University of California, Los Angeles, USA,,
Image segmentation deals with the problem of partitioning
the images into segments. Image segmentation is said to
be interactive if the segmentation process is performed with
feedback from the user. In this report, we study one particular approach proposed by Boykov, and Jolly [BJ01] to
solve the interactive image segmentation problem. BoykovJolly introduced the technique of graph cuts to achieve the
goal of partitioning images into two segments, that they call
object and background. In particular, the problem of
interactive segmentation is reduced to the problem of finding graph min-cuts, which is then solved using the max flow
algorithm of Boykov, and Kolmogorov [BK01]. We implement the Boykov-Jolly method and analyze its performance
against various test cases. An optimization that improves
the efficiency of the algorithm is proposed and incorporated
in our implementation.

by Boykov-Jolly. Once the weighted graph is constructed,

the min-cut of the graph is determined. The min-cut partitions the graph into two components which translates to a
partition of the image. It was also shown in Boykov-Jolly
that min-cut yields a solution that minimizes the cost function among all the solutions that satisfy the hard constraints.
There are alternate methods to achieve image segmentation such as snakes [KWT88, Coh91], deformable templates
[YHC92] and so on. To quote [BJ01], these techniques only
work for two dimensional images and when the segmentation
boundary is one dimensional.

Our Work. In this work, we do the following.

1. We implement the approach of Boykov-Jolly [BJ01]
and we consider many test cases and analyze its performance on these test cases. An optimization to improve the efficiency of Boykov-Jolly [BJ01] is also proposed and incorporated in the implementation. We
also demonstrate test cases for which inaccuracies are

We give two approaches to extend the method to achieve

image segmentation where the number of partitions can be
more than two. The implementation, along with the results,
of the one of these approaches is also provided.



The problem of image segmentation is one of the important problems in computer vision. Image segmentation has
various applications, for example, recognizing body parts in
medical images, detecting weapons in luggages in the airport
and so on. In this report, we study how to achieve image
segmentation when there is interaction from the user. This
is termed as interactive image segmentation. In particular,
we study the approach by Boykov-Jolly [BJ01] who propose
a solution, using the concepts from graph min-cuts, to the
problem of interactive image segmentation. They show how
to reduce the problem of interactive image segmentation to
the problem of finding graph min-cuts. We briefly recall
their approach below. More details are provided in Section 2.
The segmentation process begins by the user first marking
some pixels to be either object pixels or background pixels.
These are called the hard constraints. Then, the image is
modeled as a weighted graph. The weights in the graph are
determined based on the initial markings specified by the
user and a cost function (also referred to as soft constraints)
that is determined based on some a priori knowledge about
the images. We later describe the specific cost function used

The author list is in the alphabetical order.

2. We extend the above approach beyond segmenting the

images into two objects and show how to segment multiple objects. We give two approaches to achieve this
one approach involves modifying the Boykov-Jolly
algorithm (more specifically, the representation of the
weighted graph is changed) and the second one involves
executing Boykov-Jolly algorithm in a black box manner. We develop the theory for the first approach while
the implementation of only the second approach is provided. The results of this method is then described.

Organization. We first recall the Boykov-Jolly [BJ01] algorithm at a high level in Section 2. In the same section, we also describe the max flow algorithm of BoykovKolomogorv [BK01]. In Section 3, we show how to extend
the Boykov-Jolly method to obtain segmentation of images
into multiple objects. Finally, in Section 4, we give details
about the implementation of the Boykov-Jolly algorithm and
test it on different images. An implementation of an alternate multi-object segmentation method is also provided in
the same section.



We describe the Boykov-Jolly method in the following steps.

Recall that in the beginning of the segmentation process, the
user marks some pixels to be either object pixels or background pixels. We denote the set of object pixels marked by
the user to be O and the set of background pixels to be B.

Cost Function (soft constraints). The pixels marked by

the user alone will not be sufficient for any algorithm to be
able to achieve the desired segmentation of the image. Some
a priori knowledge about the image is necessary and this is
captured in the form of a cost function. Before we describe
the cost function, we first give some notation. We use P to
denote the set of all the pixels. The notation N to denote
the set of all unordered pairs {p, q} such that p and q are
neighbors 1 to each other in the image. For each pixel p, we
use Ap to be a binary value to indicate whether p is either
object or background. That is, we assign Ap to be obj
if p is object and bck otherwise. Two important functions
are defined below. Here, R(A) denotes the region properties
term, while B(A) denotes the background properties term.
Intuitively, Rp (obj) (resp., Rp (bck)) denotes the penalty of
assigning the pixel p to be obj (resp., bck). And, Bp,q is interpreted as the penalty of assigning two neighboring pixels
to be similar or dissimilar. We describe later how Rp and
Bp,q is determined.
R(A) =
Rp (Ap )

{p, q}
{p, S}

{p, T }

Rp (bck)
Rp (obj)

{p, q} N
p P, p
/ OB
p P, p
/ OB

Table 1: Weights on the edges of the graph are assigned

to the above table. The value K is set to be
p,q + 1.

It was shown in Boykov-Jolly that the min-cut of G is such

that the following properties are satisfied.
1. For every non-terminal vertex3 p, exactly one of the
edges connecting p to the terminal vertices is in the
2. For every pixel p marked by the user, the edge (p, S)
is in the min-cut if p is marked to be object pixel and
(p, T ) is in the min-cut if p is marked to be background


B(A) =

Bp,q (Ap , Aq ),



where (Ap , Aq ) = 1 if Ap 6= Aq and 0, otherwise.

We now describe the cost function, denoted by E(A), as a
function of R(A) and B(A):
E(A) = R(A) + B(A)

Now that we have the above two properties at hand, we can

show that the min-cut gives us an image segmentation for
every pixel p, if (p, T ) is in the cut then p is marked to be
object pixel and if (p, S) is in the cut then p is marked to
be a background pixel.
Further it was shown that the solution yielded by computing
the min-cut is the optimal solution i.e., the one that minimizes the cost function among all the solutions that satisfy
the hard constraints.

We set = 7 for our applications.

Modeling the image as a weighted graph. In the next
step, we model the image as a weighted graph. Given the
image, we associate every pixel in the image to a vertex in
the graph. We draw an edge between two vertices if their
corresponding pixels in the image are neighboring. We further augment the graph by adding two terminal vertices,
denoted by S and T . Edges from S (resp., T ) is added
to every vertex in the graph. We now add weights to the
graph. The weights are determined by Table 1. We denote
the resulting graph to be G.

Graph min-cut i.e., finding optimal image segmentation. In the final step, we find the min-cut2 of the graph G.

Pixel p is neighboring to q if it appears above, left, right,

or below q.
A graph cut is defined to be a set of edges that disconnects
the graph into exactly components. And a min-cut is a
graph cut such that the sum of the weight of edges in the
set is minimum among all the sets of edges corresponding to

Min-cut/ Max-flow algorithm

We now focus on showing how to compute min-cut of the

graph G as described above. This is performed by executing
the maxflow algorithm, proposed by Boykov, Kolmogorov
[BK01], on G, where S denotes the source and T denotes the
sink4 .The algorithm as described in [BK01] has three main
stages, namely, the growth stage, augmentation stage and
adoption stage. In addition, there is an initialization stage
that initializes the data structures. The max-flow algorithm
involves repeating these stages in order until the maximum
flow is computed. We only give an overview of the three
stages. A more detailed description can be found in [BK01].
Initialization stage: At every stage in the process, a set of
active vertices are maintained. This active set is initialized
to {S, T }. Further, we maintain two search trees throughout
the process and we call it the S-search tree and the T -search
Here, we interchangeably use the same notation for the
pixel and its corresponding vertex.
Recall that S and T are the terminal vertices in the graph

tree. A vertex either belongs to the S-search tree or T -search

tree or it does not belong to any tree.
Growth stage: In this stage, the two search trees are expanded until they meet at a common edge. That is, the
neighbors of an active vertex, say u, is explored until it hits
an active vertex belonging to the other search tree. At this
point, we have found a path from S to T and so, we move
to the augmentation stage. If no such neighbor exists then
u is set to be inactive and all its neighbors are set to be
active vertices that are then going to be used for further
Augmentation stage: Once we get a path from S to T
in the growth stage, denoted by P, we perform the augmentation stage. In this stage, we choose the minimum weight
edge on the path P. Denote this weight by w. We then
reduce the weight of all the edges on this path by w. This
leaves one of the edges, say e = (a, b), on this path to be
saturated. We then mark b to be an orphan vertex, if b is
farther away from S than a on P. Further, we mark b to be
inactive if it is marked to be active in the growth stage. We
then move to the adoption stage.
Adoption stage: As the name suggests, this stage involves
the orphan vertex created in the augmentation stage to be
adopted by a different vertex. More precisely, for the orphan
vertex b created earlier we find a new parent in either of the
search trees. If a parent can be found then we attach b to the
corresponding search tree and then continue the execution
by going back to the growth stage. If no such parent can be
found then it becomes a free vertex in which case all of its
children are also now marked as free vertices.
The algorithm terminates if the search trees S and T are
separated by just saturated edges.
We also implement the classical Ford-Fulkerson algorithm
[FF62] and compare it with the performance of BoykovKolmogorov algorithm. We observe that for most images,
the Ford-Fulkerson algorithm takes significantly more time
than the Boykov-Kolmogorov algorithm. For example, for
an image of a lion (Figure 6) of dimension 180x119, the
Boykov-Kolmogorov algorithm takes 19 seconds to execute
while the Ford-Fulkerson takes about 6 minutes. When an
image of dimension 183x46 was considered, then the running time of Boykov-Kolmogorov was about 6 seconds while
the Ford-Fulkerson takes about 2 minutes. We believe that,
based on our experiments, the number of pixels (object or
background) the user marks has an impact on the running
time of the Boykov-Kolmogorov max flow algorithm if
more number of pixels are marked, the algorithm is faster.




We show how to extend the method presented in the previous section to multi-object image segmentation. That is, we
are interested in segmenting the image into, say k, segments
(call it obj1 , . . . , objk ) as against 2. To solve this problem
we essentially follow the same steps as before. The main
change comes in the step when we have to reduce our problem to the min-cut problem. In this case, we construct a
weighted graph with k terminals from the given image. We

then reduce our problem to a multiway-cut problem, that

can be seen as a generalization of the s-t mincut problem.
The multiway-cut problem is in general NP-hard but we
can consider approximation algorithms of the multiway cut
problem. There are approximation algorithms with approximation factors less than 2 (for example, [DJP+ 94]). In this
work, we dont implement this approach and we leave this
as an interesting future direction. Instead we implement
a rather ad hoc method that internally uses the objectbackground image segmentation method of Boykov-Jolly in
a black-box way to achieve the result. We talk about this
later in Section 4.1. The details of the approach is given
At the beginning of the image segmentation process, the user
chooses some pixels to indicate which pixels correspond to
which segment. We use Oi to denote the set of pixels the
user marks corresponding to obji .
Step 1: Determining the cost function: We use the
exact same cost function as described in the previous section.
Our cost function is
E(A) = R(A) + B(A),
where R(A) and B(A) are as defined in Section 2. However,
unlike the previous case, here A takes k values, obj1 , . . . , objk .
Step 2: Modeling the image as a weighted graph:
As before, we associate a vertex in the graph for every pixel
in the image. We draw edges between the vertices if the
corresponding pixels are neighbors in the image. We then
assign k special terminals S1 , . . . , Sk and then we draw edges
from Si to all the vertices in the graph. In the next stage,
we assign weights to the edges in the graph. This is done
according to Table 2.
{p, q}




Rp (obji )


{p, S1 }



Rp (obji )


{p, Sk }

Rp (obj1 )
Rp (objk )

{p, q} N
p P, p
/ ki=1 Oi
p O1
p Ok
p P, p
/ ki=1 Oi
p O1
p Ok

Table 2: Weights on the edges of the graph are assigned

to the above table. The value K is set to be
{p,q}N Bp,q + 1.

Step 3: Finding image segmentation via multiway

cut: In the last step, we show that a multiway cut of the
graph yields an image segmentation. A k-multiway cut of
a graph with k terminal nodes consists of a set of edges of

minimum weight that (pairwise) separates all the terminal

nodes. However, the multiway-cut problem is NP-complete
and hence we resort to an approximation algorithm.
More precisely, we can show that the multiway cut of the
graph derived from the image has the following properties.
These properties are the same as stated in Section 2.
1. For every non-terminal vertex p, exactly one of the
edges connecting p to the terminal vertices is in the
2. For every pixel p marked by the user, the edges (p,
{Sj }j6=i ) are in the min-cut if p is marked to be obji ,
for every i {1, . . . , k}.
Once we have both the properties, we can now show that a
multiway cut of the graph yields us an image segmentation.
For every pixel p, let i, be such that the edge (p, Si ) is not
in the multiway cut. In this case, we assign p to be obji .
The first property ensures that there is a unique such i for
every pixel. The second property ensures that all the pixels
are assigned to some segment.
We can show that, on the lines of Boykov-Jolly, the multiway
cut even yields an optimal image segmentation. However, as
mentioned before, we cannot implement the multiway cut
directly since it is NP-complete. But instead, as a heuristic,
we can consider an approximation algorithms of multiway
cut; for example, we can consider a 2(1 k1 )-approximation
algorithm given by [DJP+ 94].
As remarked earlier, we implement an ad hoc method of
multi-object segmentation (see Section 4.1) that bypasses
the above approach but we believe that the above approach
will be more efficient.



Our implementation is performed on a INTEL i7 2.2GHz

system with 8GB RAM. We use the programming language
Java and the Swing toolkit for graphical user interface (GUI).
Further, we use OpenCV package for image (pre and post)processing more specifically, this involves converting the
image into a binary matrix and back (during the segmentation phase). The regional properties terms as well as the
boundary properties terms (defined in Section 2) are learnt
(using a negative log-likelihood function as given in [BJ01])
from the PASCAL dataset [url]. For our GUI, we provide a
window with buttons providing the user to choose the image
pixels and the background pixels. We give a snapshot of the
window in Figure 1.

Figure 1: This is a snapshot of the window of our interactive

image segmentation tool. We provide a button Select Image for selecting an image. After selecting the image, the
user can choose the object pixels and the background pixels. To facilitate this, we further provide the user with two
buttons, namely, Select Object and Select Background.
Finally, to segment the image, we have provided the Segment button.

obtained for each of these blocks to obtain the global solution

for the entire image.


Multi-object segmentation: Ad hoc method

We implement an extension of the above approach, where

we segment the images into two or more segments (instead
of just two). This is done in an iterative fashion and in
the ith iteration, the image segmentation segments the ith
object while treating everything else (even other objects)
as background. In the end, the results obtained in all the
iterations are combined together to obtain the multi-object
For example, in Figure 2, we are interested in segmenting
the image into 3 segments, namely, banana, orange and the
background. The user will first mark the object pixels for
both the banana and the orange separately (we provide different buttons for different objects). Once this is done, then
we process the image as if only the banana is the object,
with everything else (including orange) to be background,
and then mark the appropriate pixels in yellow. Following
this, we reprocess the image again as if only the orange is
the object and mark the appropriate pixels in orange. The
result of this process is an image which identifies each object
in the image uniquely, as can be seen in Figure 3.

Optimizations. One of our optimizations is to that we take

a divide-and-conquer approach on top of the Boykov-Jolly
method to speed up the computation. That is, we divide
the divide the image graph into blocks of suitable size. For
example, if the image is of size 400x400, we divide it into 16
blocks of size 100x100. After performing this division, we
then implement the image segmentation algorithm on each
of these blocks separately and then combine the solutions

Figure 2


Figure 3

Test Cases

We consider images of two categories one with a plain

background and other with a noisy background. A noisy

background is one which also has other objects in the background. This is done to see if the choice of the background
has effects the accuracy of our implementation. Further, in
each of these categories we consider different kinds of images.
We obtain nearly accurate segmented images for images of
both these categories. We now describe briefly the images
we consider and the results we obtain.
1. We consider an image of fish Figure 4, with water in the
background. We get an accurate segmentation of fish
as reported in Figure 5. We then consider the image,
in Figure 6, of a lion with sky in the background. We
get nearly accurate results here with some noise near
the boundary region of the lion. Some background pixels which were supposed to be marked blacked are not.
We report the final image in Figure 7. Another image
of a plane with blue sky in the background is considered. The original image and the segmented image are
presented in Figure 8 and Figure 9 respectively.

Figure 4

Figure 6

Figure 8

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14

Figure 15

Figure 16

Figure 17

Figure 5

Figure 7

Figure 9

2. We consider the image, in Figure 10, of a butterfly,

with flowers along with a light background of trees and
grass. Here, we can almost accurately segment the butterfly with the boundary of the butterfly being noisy.
The final image is given in Figure 11. There were some
inaccuracies when an image, in Figure 12, of an ice
structure with ice and sunlight in the background was
considered. Here, our implementation could not correctly identify the boundary between the ice structure
and powdered ice. We report the final image in Figure 13. We consider another image, in Figure 14, of
a tiger running with the shadow of tiger and the ice
in the background. The segmented image is reported
in Figure 15. Another challenging image was that of a
flower with leaves in the background as shown in Figure 16. We get an almost accurate result in Figure 17.

We also similar results for our multi-object segmentation

implementation, which was described in Section 4.1. This is
not surprising since the multi-object segmentation approach
internally runs our object-background image segmentation
We consider an image of a bird holding a fish while sitting
on a stick, as in Figure 18. Our objective is to identify the
bird, fish and the stick separately. We report the result in
Figure 19.

Figure 18

Figure 19

data structures in our implementation, for example: using

adjacency list instead of adjacency matrix to represent our
graph, we believe that the running time should be reduced.
One other place where we can improve is to smoothen out
the boundary region of the images. Currently, the boundary
portion is noisy with a few pixels in the background wrongly
identified as object pixels.
Figure 20

Figure 21

Measuring accuracy. We define a measure to determine

the accuracy of our images. We calculate the accuracy by
first defining either the object or the background with the
help of color intensity range in advance. This measurement
is not completely accurate since it can sort some pixels
within the object incorrectly, however, the type of images
selected make this partitioning possible with high accuracy.
We first mark the images list of pixels into background and
object based on the color shade range and this is done individually for every image. For example, in Figure 6, all
shades of blue are marked background in the lion image
and rest of the colors are marked as object. This allows
us to roughly estimate the accuracy with which each pixel
is sorted into object and background by the graph min-cut
approach algorithm.
We report the accuracy of our results in Table 22.
Fruits (Figure 2)
Fish (Figure 4)
Lion (Figure 6)
Plane (Figure 8)
Butterfly (Figure 10)
Tiger (Figure 14)
Bird with fish (Figure 18)
Parrots (Figure 20)



Figure 22: Accuracy results



In this work, we deal with the problem of interactive image

segmentation. In particular, we consider the graph min-cut
based approach proposed by Boykov-Jolly [BJ01]. We implement this method and further test the implementation on
different images and get nearly accurate results. We also describe an optimization to reduce the running time of our implementation. A comparison of two different min-cut algorithms (Boykov-Kolmogorov [BK01] and Ford-Fulkerson [])
and their effect on the efficiency of our interactive image
segmentation tool is also performed.
The approach of Boykov-Jolly [BJ01] only deals with segmenting images into object and background. We give an
extension of this approach where we show how to segment
images into multiple objects as against just two. We also
demonstrate the corresponding results.
There are interesting future directions to this work. The
current implementation takes few hours to segment images
of large dimension (in terms of pixels). By a careful choice of




Yuri Y Boykov and M-P Jolly. Interactive graph

cuts for optimal boundary & region
segmentation of objects in nd images. In
Computer Vision, 2001. ICCV 2001.
Proceedings. Eighth IEEE International
Conference on, volume 1, pages 105112. IEEE,
Yuri Boykov and Vladimir Kolmogorov. An
experimental comparison of min-cut/max-flow
algorithms for energy minimization in vision. In
Energy minimization methods in computer
vision and pattern recognition, pages 359374.
Springer, 2001.
[Coh91] Laurent D Cohen. On active contour models and
balloons. CVGIP: Image understanding,
53(2):211218, 1991.
[DJP+ 94] Elias Dahlhaus, David S. Johnson, Christos H.
Papadimitriou, Paul D. Seymour, and Mihalis
Yannakakis. The complexity of multiterminal
cuts. SIAM Journal on Computing,
23(4):864894, 1994.
LR Ford and Delbert Ray Fulkerson. Flows in
networks, volume 1962. Princeton Princeton
University Press, 1962.
[KWT88] Michael Kass, Andrew Witkin, and Demetri
Terzopoulos. Snakes: Active contour models.
International journal of computer vision,
1(4):321331, 1988.
[YHC92] Alan L Yuille, Peter W Hallinan, and David S
Cohen. Feature extraction from faces using
deformable templates. International journal of
computer vision, 8(2):99111, 1992.