Professional Documents
Culture Documents
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted by
K. SATYA (18JN5A0430)
T. SUSHMI (17JN1A0452)
M. REVATHI (17JN1A0460)
A. AKHILA (17JN1A0457)
A. GEETHA SUREKHA (17JN1A0467)
Under the esteemed guidance of
Smt. S. ANITHA, M.Tech
Assistant professor, Dept. of ECE
CERTIFICATE
This is to certify that the thesis entitled “IMAGE SEGMENTATION USING
REGION-BASED OBJECT DETECTOR” is being submitted by K.SATYA,
T. SUSHMI, M. REVATHI, A. AKHILA, A. GEETHA SUREKHA has been
carried out in partial fulfilment of the requirement for the award of BACHELOR OF
TECHNOLOGY in ELECTRONICS AND COMMUNICATION
ENGINEERING from KAKINADA INSTITUE OF ENGINEERING AND
TECHNOLOGY for WOMEN affiliated to JNTU-KAKINADA is a record of
Bonafede work carried out by them under guidance and supervision. The results
embodied in this thesis has not been submitted to any other university or institute for
the award of Degree
EXTERNAL EXAMINER
ACKNOWLEDGEMENT
We are thankful to both Teaching and Non-Teaching staff members of ECE department
for their kind cooperation and all sorts of help bringing out this project work successfully.
K. SATYA (18JN5A0430)
T. SUSHMI (17JN1A0452)
M. REVATHI (17JN1A0460)
A. AKHILA (17JN1A0457)
A. GEETHA SUREKHA (17JN1A0467)
DECLARATION
This work has not been previously submitted to any other institution or University
for the award of any other degree or diploma.
K. SATYA (18JN5A0430)
T. SUSHMI (17JN1A0452)
M. REVATHI (17JN1A0460)
A. AKHILA (17JN1A0457)
A. GEETHA SUREKHA (17JN1A0467)
ABSTRACT
Semantic image segmentation, which becomes one of the key applications in image
pro-cessing and computer vision domain, has been used in multiple domains such as medical
area and intelligent transportation. However, current state-of-the-art models use a separate
representation for each task making joint inference clumsy and leaving the classification of
many parts of the scene ambiguous. In this paper, we explore a simple semantic segmentation
approach using region-based object detector which only needs bounding box annotations. The
main idea is using object detector to classify region proposals and then applying saliency
I
TABLE OF CONTENTS
CONTENTS PAGE NO
CHAPTER 1: INTRODUCTION
1.1 Introduction 1
1.2 Convolution Networks 2
CHAPTER 2: LITERATURE SURVEY 4
CHAPTER 3: IMPLEMENTATION METHODS
3.1 Introduction 9
3.2 Segmentation approaches 10
3.2.1 Region based Semantic Segmentation 10
3.2.2 R-CNN (Regions with CNN feature) 11
3.2.3 Fully Convolutional Neural Network based
Semantic Segmentation 12
3.2.4 Weakly Supervised Semantic Segmentation 13
CHAPTER 4: INTRODUCTION TO IMAGE PROCESSING
& MATLAB
4.1 Introduction to Image processing 15
4.1.1 Region Proposal generation 16
4.1.1 (a) Contour Map Generation 17
4.1.1 (b) Convolutional Encoder – Decoder Network 20
4.1.2 Object detection 22
4.1.3 Object Segmentation 27
4.2 MAT Lab 37
4.2.1 Introduction 37
4.2.2 MAT Lab’s power of Computational Mathematics 37
4.2.3 Features of MAT Lab 38
4.2.4 Uses of MAT Lab 39
4.2.5 Environment Set Up 39
II
4.2.6 Understanding The MATLAB Environment 41
III
LIST OF FIGURES
(a)Absolute Direction
(b)Relative Direction
Decoder Network 21
Abject
IV
13 2.10 ROI Poll 27
Kind Of Lock
27 4.1 Mask-1 46
28 4.2 Mask-2 46
29 4.3 Mask-3 47
30 4.4 Mask-4 47
V
31 4.5 Contour Map 48
V
Image Segmentation using Region-based Object Detector
CHAPTER – 1
INTRODUCTION
1.1 INTRODUCTION
clustering parts of image together which belong to the same object class (Thoma2016).Two other
main image tasks are image level classification and detection. Classification means treating each
image as an identical category. Detection refers to object localization and recognition. Image
segmentation can be treated as pixel-level prediction because it classifies each pixel into its
category.
Moreover, there is a task named instance segmentation which joints detection and
segmentation together. Object detection is one of the great challenges of computer vision, having
received continuousattention since the birth of the field. The most common modernapproaches
scan the image forcandidate objects and score each one. This is typified by the sliding-window
object detection ap-proach, but is also true of most other detection schemes (such as centroid-
The most successful approaches combine cues frominside the object boundary (local
features) with cues from outside the object (contextual cues). Recent works are adopting a more
holistic approach by combining the output of mul-tiple vision tasks and are reminiscent of some
of the earliest work in computer vision. However, these recent works use a different
mappings.
KIETW-ECE Page 1
Image Segmentation using Region-based Object Detector
Another difficulty with these approachesis that the subtask representations can be
inconsistent. For example, a bounding-box based objectdetector includes many pixels within
each candidate detection window that are not part of the ob-ject itself. Furthermore, multiple
overlapping candidatedetections contain many pixels in common. How these pixels should be
treated is ambiguous in such approaches. A model that uniquely iden-tifies each pixel is not only
more elegant, but is also more likely to produce reliable results since itencodes a bias of the true
world (i.e., a visible pixel belongs to only one object) Semantic segmentation is a very important
topic in computer vision due to its crucial contribution for image understanding. The task is to
assign every single pixel a specific category label, such as person, car, and so on, which could be
considered as a dense pixel classification problem. It predicts the label, location and shape of
each object, thus is also called object parsing in some references. And it can be applied in broad
potential applications, such as automatic driving, robot sensing, to name a few. Recently, great
progress has been explored in the area of semantic image segmentation due to the rise of deep
learning. Specifically, it mainly uses Deep convolutional neural networks (CNNs) to extract rich
CNNs is very effective for image classification problem, encouraged by this, scholars
start to apply CNNs to dense prediction problems. In 2015, Long et al. first proposed an end-to-
end fully convolutional network (FCN) for semantic segmentation. However, the obtained label
map is very coarse as can be seen in Fig, However, the obtained label map is very coarse as can
be seen in Fig, that is because multiple stages of convolution and pooling strides reduce the final
prediction typically by a factor of 32 in each dimension, such low-resolution result loses much of
KIETW-ECE Page 2
Image Segmentation using Region-based Object Detector
sampling operation to increase the resolution of prediction maps. Chen et al. proposed DeepLab
which employs atrous (or dilated) convolutions to account for larger receptive fields without
Recently, the author made some new improvement and proposed DeepLab v3 which gets
state-of-the-art performance, thus is widely applied. Zheng et al. propose a new type of CNNs by
combining the strengths of CNNs and Conditional Random Fields (CRFs) to improve accuracy.
While the fully-connected CRF is time consuming, Chen et al. replaced it by bilateral filtering
Recently, more powerful approaches are proposed. ith the development of the Internet of
Things, more and more image data are collectedby various image sensors or video sensors.
Before using image data for more complex computervision tasks, we need to know what objects
are in the image and where they are located. Therefore, object detection has always been a hot
research direction in the field of computer vision, and itspurpose is to locate and classify objects
in images or videos. Object detection has been widely used inmany fields, including intelligent
traffic and human pose estimation. Traditional algorithms solve the detection problem for
images by finding foreground andbackground from the picture and then manually extracting
foreground features for classification. The algorithm of extracting the foreground can be divided
into static and dynamic according to the stateof the object. The static object detection algorithm
for images usually uses the background subtraction algorithm. The foreground is the part where
KIETW-ECE Page 3
Image Segmentation using Region-based Object Detector
CHAPTER 2
LITERATURE SURVEY
In Lin et al. present a novel multi-path refinement network called RefineNet that uses all
In, Bertasius et al. introduced a simple, yet efficient Convolutional Random Walk
Network to address the issue of poor boundary localization. Although many effective methods
have been explored, it is still very challenging to obtain high-resolution segmentation results
Dai et al. as an extra supervision for training convolutional networks to segment semantic
regions. As we know, bounding box annotations can be obtained more easily than masks,
although they are less precise, their amount may help improve segmentation performance.
Similarly, Khoreva et al. proposed to recursively train a convnet such that outputs are
improved after each iteration by using bounding box annotations only. Another interesting work
is scribble supervision segmentation presented by Lin et al. Scribbles are very widely used in
interactive image segmentation and more user-friendly than bounding boxes. In, Bearman et al.
took a step towards stronger supervision for semantic segmentation by pointing. There are also
some other forms of weakly supervised method have been explored as well, such as eye tracks,
noisy web tags. All these approaches require much less annotation effort during training, but
Our method inherits features from the sliding-window object detector works, such as
Torralba et al.and Dalal and Triggs, and the multi-class image segmentation work of Shotton et
al.We further incorporate into our model many novel ideas for improving object detection via
scene context. The innovative works that inspire ours include predicting camera viewpoint for
KIETW-ECE Page 4
Image Segmentation using Region-based Object Detector
estimating the real world size of object candidates, relating “things” (objects) to nearby “stuff”
(regions), co-occurrence of object classes, and general scene “gist”. Recent works go beyond
simple appearance-based context and show that holistic scene under-standing (both geometric
and more general) can significantly improve performance by combining related tasks. These
works use the output of one task (e.g., object detection) to provide features for other related tasks
(e.g., depth perception).While they are appealing in their simplicity, current models are not
tightly coupled and may result in incoherent outputs (e.g., the pixels ina bounding box identified
as “car” by the object detector, maybe labeled as “sky” by an image segmentation task). In our
method, all tasks use the same region-based representation which forces consistency between
variables. Intuitively this leads to more robust predictions. The decomposition of a scene into
regions to provide the basis for vision tasks exists in some scene parsing works.
Notably, Tu et al. describe an approach for identifying regions in thes cene. Their
approach has only been shown to be effective on text and faces, leaving much of theimage
unexplained. Sudderth et al. relate scenes, objects and parts in a single hierarchical framework,
but do not provide an exact segmentation of the image. Gould et al. provides a complete
description of the scene using dynamically evolving decompositions that explain every pixel
(both semantically and geometrically). However, the method cannot distinguish between
between foreground objects and often leaves them segmented into multiple dissimilar pieces.
Our work builds on this approach with the aim of classifying objects.
Other works attempt to integrate tasks such as object detection and multi-class image
segmentation into a single CRF model. However, these models either use a different
representation for object and non-object regions or rely on a pixel-level representation. The
former does not enforce label consistency between object bounding boxes and the underlying
pixels while the latter does not distinguish between adjacent objects of the same class. Recent
KIETW-ECE Page 5
Image Segmentation using Region-based Object Detector
work by Gu et al. also use regions for object detection instead of the traditional sliding-window
approach. However, unlike our method, they use a single over-segmentation of the image and
make the strong assumption that each segment represents a (probabilistically) recognizable
object part. Our method, on the other hand, assembles objects (and background regions) using
portions of the image thereby reducing the number of component regions that need to be
considered for each object. Liu et al. use a non-parametric approach to image labeling by
warping a given image onto a large set of labeled images and then combining the results. This is
a very effective approach since it scales easily to a large number of classes. However, the
method does not attempt to understand the scene semantics. In particular, their method is unable
to break the scene into separate objects (e.g., a row of cars will be parsed as a single region) and
cannot capture combinations of classes not present in the training set. As a result, the approach
In recent years, many algorithms have been proposed to address the problem of object
detection. The object detection algorithms based on deep learning can be divided into two-stage
detection algorithms and one-stage detection algorithms. The two-stage algorithm is to first
generate a region proposal, and then target the boundary box and category prediction of the
region proposal. Girshick et al. proposed the classic regions with convolutional neural networks
(CNN) features(R-CNN) to achieve excellent object detection accuracy by using a deep ConvNet
to classify object proposals, but it is very time-consuming. To solve this problem, Girshick et al.
proposed theupgraded version of R-CNN, Faster R-CNN, which innovatively used the region
proposal network (RPN) to directly classify the region proposal in the convolutional neural
KIETW-ECE Page 6
Image Segmentation using Region-based Object Detector
network, and achieved the end-to-end goal of the whole detection framework. He et al. proposed
Mask R-CNN on the basis of Faster R-CNN, which added a branch for semantic segmentation
tasks, and used detection tasks and segmentation tasks to extract image features to improve the
accuracy of detection. He et al. proposed spatial pyramid pooling networks (SPPNet) to generate
fixed-length representations. Kong et al. proposed Hyper Net, which combines the generation of
candidate regions with the detection taskto produce fewer candidate regions while ensuring a
higher recall rate. Cai and Vasconcelos proposed Cascade R-CNN to address the problem of
overfitting and quality mismatch. The one-stage detection algorithms do not need to select region
proposals, but use the regression to directly calculate the positioning box and object category,
which further reduce the running time. Redmon et al. proposed the you only look once (YOLO)
algorithm to meet the requirements of real-time detection, but the detection accuracy of small
Liu et al. proposed the single shot multibox detector (SSD) algorithm to predict the
object from multiple feature maps, which largely solved the problem of small object detection.
Lin et al. proposed RetinaNet mainlyto solve the extremely imbalanced problem of one-stage
algorithm positive and negative samples anddifficult and easy samples. Zhang et al. proposed
the RefineDet method, which absorbed the advantages of the two-stage algorithm, so that the
one-stage detection algorithm can also have theaccuracy of the two-stage algorithm. Liu et al.
proposed RFBNet to use cavity convolution toimprove the receptive field. Shen et al. proposed
deeply supervised object detector (DSOD) torestart training neural networks for detection tasks,
and also introduced the idea of DenseNet, which greatly reduced the number of parameters. Law
and Deng proposed Cornernet to detectan object bounding box as a pair of keypoints using a
KIETW-ECE Page 7
Image Segmentation using Region-based Object Detector
CenterNet to detect each object as a triplet of keypoints.Tian et al. proposed fully convolutional
one-stage object detector (FCOS) to solve object detectionin a per-pixel prediction fashion.
KIETW-ECE Page 8
Image Segmentation using Region-based Object Detector
CHAPTER 3
IMPLEMENTATION METHODS
3.1 INTRODUCTION
Image segmentation is useful in many applications. It can identify the regions of interest
in a scene or annotate the data. We categorize the existing segmentation algorithm into region-
includes the seeded and unseeded region growing algorithms, the JSEG, and the fast scanning
algorithm. All of them expand each region pixel by pixel based on their pixel value or quantized
value so that each cluster has high positional relation. For data clustering, the concept of them is
based on the whole image and considers the distance between each data. The characteristic of
data clustering is that each pixel of a cluster does not certainly connective. For data clustering,
the concept of them is based on the whole image and considers the distance between each data.
The characteristic of data clustering is that each pixel of a cluster does not certainly connective.
The basis method of data clustering can be divided into hierarchical and partitional clustering.
Furthermore, we show the extension of data clustering called mean shift algorithm, although this
algorithm much belonging to density estimation. The last classification of segmentation is edge-
based segmentation. This type of the segmentations generally applies edge detection or the
concept of edge. The typical one is the watershed algorithm, but it always has the over-
segmentation problem, so that the use of markers was proposed to improve the watershed
algorithm by smoothing and selecting markers. Finally, we show some applications applying
KIETW-ECE Page 9
Image Segmentation using Region-based Object Detector
decoder network.
The task of the decoder is to semantically project the discriminative features (lower resolution)
learnt by the encoder onto the pixel space (higher resolution) to get a dense classification.
Unlike classification where the end result of the very deep network is the only important
thing, semantic segmentation not only requires discrimination at pixel level but also a
mechanism to project the discriminative features learnt at different stages of the encoder onto the
pixel space. Different approaches employ different mechanisms as a part of the decoding
pipeline, which first extracts free-form regions from an image and describes them, followed by
region-based classification. At test time, the region-based predictions are transformed to pixel
predictions, usually by labeling a pixel according to the highest scoring region that contains it.
The region-based methods generally follow the segmentation using recognition pipeline, which
first extracts free-form regions from an image and describes them, followed by region-based
classification. At test time, the region-based predictions are transformed to pixel predictions,
usually by labeling a pixel according to the highest scoring region that contains it.
KIETW-ECE Page 10
Image Segmentation using Region-based Object Detector
It is one representative work for the region-based methods. It performs the semantic
segmentation based on the object detection results. To be specific, R-CNN first utilizes selective
search to extract a large quantity of object proposals and then computes CNN features for each
of them.
Finally, it classifies each region using the class-specific linear SVMs. Compared with
traditional CNN structures which are mainly intended for image classification, R-CNN can
address more complicated tasks, such as object detection and image segmentation, and it even
becomes one important basis for both fields. Moreover, R-CNN can be built on top of any CNN
For the image segmentation task, R-CNN extracted 2 types of features for each region:
full region feature and foreground feature, and found that it could lead to better performance
when concatenating them together as the region feature. R-CNN achieved significant
performance improvements due to using the highly discriminative CNN features. However, it
KIETW-ECE Page 11
Image Segmentation using Region-based Object Detector
The feature does not contain enough spatial information for precise boundary generation.
Generating segment-based proposals takes time and would greatly affect the final performance.
Due to these bottlenecks, recent research has been proposed to address the problems, including
The original Fully Convolutional Network (FCN) learns a mapping from pixels to pixels,
without extracting the region proposals. The FCN network pipeline is an extension of the
classical CNN. The main idea is to make the classical CNN take as input arbitrary-sized images.
The restriction of CNNs to accept and produce labels only for specific sized inputs comes from
the fully-connected layers which are fixed. The FCN network pipeline is an extension of the
classical CNN. The main idea is to make the classical CNN take as input arbitrary-sized images.
The restriction of CNNs to accept and produce labels only for specific sized inputs comes from
Contrary to them, FCNs only have convolutional and pooling layers which give them the
ability to make predictions on arbitrary-sized inputs. One issue in this specific FCN is that by
propagating through several alternated convolutional and pooling layers, the resolution of the
output feature maps is down sampled. Contrary to them, FCNs only have convolutional and
pooling layers which give them the ability to make predictions on arbitrary-sized inputs. One
issue in this specific FCN is that by propagating through several alternated convolutional and
pooling layers, the resolution of the output feature maps is down sampled.
KIETW-ECE Page 12
Image Segmentation using Region-based Object Detector
Therefore, the direct predictions of FCN are typically in low resolution, resulting in
A variety of more advanced FCN-based approaches have been proposed to address this
Most of the relevant methods in semantic segmentation rely on a large number of images
with pixel-wise segmentation masks. However, manually annotating these masks is quite time-
Therefore, some weakly supervised methods have recently been proposed, which are
commercially expensive.
KIETW-ECE Page 13
Image Segmentation using Region-based Object Detector
For example, Box sup employed the bounding box annotations as a supervision to train
the network and iteratively improve the estimated masks for semantic segmentation.
Simple Does It treated the weak supervision limitation as an issue of input label noise and
KIETW-ECE Page 14
Image Segmentation using Region-based Object Detector
CHAPTER 4
3. Object segmentation
This project has proposed our approach using object detector for semantic segmentation.
In detail, it includes region proposal generation, object detection, and object segmentation. We
first use proposal generator to get some object proposals and their corresponding masks. Then,
KIETW-ECE Page 15
Image Segmentation using Region-based Object Detector
we use region-based object detector to classify them to obtain their category labels. Finally, we
try to introduce saliency detection method to each object box to get their segmented results using
proposal masks as object seeds. The detailed process pipeline is shown in above figure.
subsequent applications with a couple of image regions that objects might occur. And current top
performing object detectors all use region proposals, such as Faster R-CNN, R-FCN. Almost all
the object proposal generation methods could be split into two kinds: grouping based and sliding
window based. The first kind approaches can generate relatively high accurate object bounding
boxes and masks at the same time. Thus, this paper focuses this type. Experiments in show that
MCG (multiscale combinatorial grouping) gets the best performance among all low-level
feature-based proposal generators. Segments in MCG are merged based on contour strength. In
order to boost the performance of object proposals, we use powerful contour detection method
contour detector in MCG. In MCG, ultra-metric contour maps are computed from multiscale and
Consider a segmentation of the image into regions that partition its domainS={Si}i. A
segmentation hierarchy is a family of partitions {S∗, S1.., SL} such that: (1)S∗is the finest set of
super pixels, (2) SL is the complete domain, and (3) regions from coarse levels are unions of
regions from fine levels. A hierarchy where each levelSiis assigned a real-valued indexλican be
represented by a dendrogram, a region tree where the height of each node is its index.
KIETW-ECE Page 16
Image Segmentation using Region-based Object Detector
obtained by weighting the boundary of each pair of adjacent regions in the hierarchy by the
This representation unifies the problems of contour detection and hierarchical image
an image removes details and smooths away boundaries, the resulting UCMs are misaligned, as
Hierarchy Alignment
the original image and applying our single-scale segmenter. In order to preserve thin structures
and details, we declare as set of possible boundary locations the Nscales by subsampling /super
sampling the original image and applying our single-scale segmenter finest super pixels in the
highest-resolution order to preserve thin structures and details, we declare as set of possible
KIETW-ECE Page 17
Image Segmentation using Region-based Object Detector
Multiscale Hierarchy
After alignment, we have a fixed set of boundary locations, and N strengths for each of
them, coming from the different scales. We formulate this problem as binary boundary
classification and train a classifier that combines these N features into a single probability of
boundary estimation.
The implemented heuristic uses the fact that branches are often only a few pixels in
length and occur towards the middle of contours to make the assumption that the set of two
possible endpoints that are the farthest apart from one another (in terms of contour pixels) are
Experimentation with this heuristic showed that it produced correct results in nearly
KIETW-ECE Page 18
Image Segmentation using Region-based Object Detector
Running the segmented image through the thinning, Moore contour tracing, and end
point finding algorithms yields each contour in a vectorized form which can then be processed
further. As can be seen in Figure, which shows the results of these steps on the previously shown
segmented image, the results from this step are quite good.
Fig: 2.4 Types of contour pixels. (a) Absolute direction; (b) relative direction; (c) types of contour
pixels: inner corner pixel (I), outer corner pixel (O) and inner-outer cornerpixel (IO)
Contour Tracing Algorithms Let I be a binary digital image withM×Npixels, where the
coordinate of the top-leftmost pixel is (0, 0) and that of the bottom-rightmost pixel is (M−1,
N−1). InI, a pixel can be represented as P= (x, y), x=0, 1, 2, ···, M−1, y=0, 1, 2, ···, N−1. Most
starts contour tracing at the contour of an object after it saves the starting point alongwith its
initial direction.2.The tracer determines the next contour point using its specific rule of following
paths accordingto the adjacent pixels and then moves to the contour point and changes its
absolute direction.3.If the tracer reaches the start point, then the trace procedure is terminated.To
determine the next contour point, which may be a contour pixel or pixel corner, the tracerdetects
KIETW-ECE Page 19
Image Segmentation using Region-based Object Detector
the intensity of its adjacent pixelPrand the new absolute directiondrforPrby usingrelative
For example, if the absolute direction of the current tracerT (P, d) isN, the leftdirection of the
tracerdLe f tisW. Similarly, the left pixel of tracer PLe f tis (x−1, y). Figure a, b showthe
directional information of the tracer, and Figure 2c shows the different types of contour pixels.
Thecontour pixels can be classified into four types, namely straight line, inner corner pixel, outer
corner pixel and inner-outer corner pixel. In Figure c, “O” represents the outer corner, “I”
represents the inner corner and “IO” represents the inner-outer corner according to the local
pattern of the contour. In this study, we focus on a contour-tracing algorithm that is suitable for
cases involving arelatively small number of objects and that require real-time tracing, such as
augmented reality (AR) mixed reality (MR) and recognition image-based code in small-scale
images, e.g., a mobile computing environment. Hence, we first introduce and briefly describe the
conventional contour-tracing algorithms that are used in this environment and analyse their
dense pixel-wise predictions like semantic segmentation, computing optical flow and disparity
maps, and contour detection. The encoder in the network computes progressively higher-level
abstract features as the receptive fields in the encoder increase with the depth of the encoder. The
spatial resolution of the feature maps is reduced progressively via a down-sampling operation,
whereas the decoder computes feature maps of progressively increasing resolution via un-
pooling or up-sampling. The network has the ability not only to model features like shape or
KIETW-ECE Page 20
Image Segmentation using Region-based Object Detector
Different variations of the encoder–decoder network have been explored in the literature
for improved performance. Skip connections (Ronneberger et al., 2015) have been used to
recover the fine spatial details during reconstruction which get lost due to successive down-
sampling operations involved in the encoder. Addition of larger context information using
image-level features (Liu et al., 2015), recurrent connections (Pinheiro and Collobert, 2014;
Zheng et al., 2015), and larger convolutional kernels (Peng et al., 2017) has also significantly
Other methods studied for improving semantic segmentation accuracy include hierarchical
supervision (Chen et al., 2016) and iterative concatenation of feature maps (Jégou et al., 2017).
KIETW-ECE Page 21
Image Segmentation using Region-based Object Detector
Our proposed up sampling idea was inspired by which is intended for unsupervised
feature learning. The fundamental aspects ofthe proposed encoder-decoder network are the
decoding process, which has numerous practical advantages regarding enhancing boundary
delineation and minimizing the total network size for enabling end-to-end training. The key
benefit of such a design is an easy to modify encoder-decoder architecture that can be adapted
and changed with very little modification. This encoder offers slow-resolution feature mapping
for pixel-wise classification. The feature maps produced through the convolution layer are
sparse, those later convolved using the decoder filters to generate detailed feature maps.
Recognizing objects and localizing them is the key of our approach. Recent progress
shows that region-based object detectors achieve state-of-the-art performance. These methods
usually include the following parts: takes an image as input, extracts some region proposals,
computes semantic features for each proposal using CNNs, classifies each proposal to obtain
their semantic label. With these labels, we only need to segment each object to get final semantic
segmentation results. Furthermore, we can also get instance segmentation results which is a more
challenging task than semantic segmentation and is beyond this paper’s scope. R-FCN (Region-
based Fully Convolutional Networks) is a new baseline in recent object detection, which is very
efficient by using FCN and powerful by using Residual Networks (ResNets) for feature
extraction.
R-CNN based detectors, like Fast R-CNN or Faster R-CNN, process object detection in 2
stages.
Generate region proposals (ROIs), and Make classification and localization (boundary boxes)
KIETW-ECE Page 22
Image Segmentation using Region-based Object Detector
Fast R-CNN computes the feature maps from the whole image once. It then derives the region
proposals (ROIs) from the feature maps directly. For every ROI, no more feature extraction is
needed. That cuts down the process significantly as there are about 2000 ROIs. Following the
same logic, R-FCN improves speed by reducing the amount of work needed for each ROI. The
region-based feature maps are independent of ROIs and can be computed outside each ROI. The
remaining work, which we will discuss later, is much simpler and therefore R-FCN is faster than
Fast R-CNN or Faster R-CNN. Here is the pseudo code for R-FCN for comparison.
R-FCN
Let’s get into the details and consider a 5 × 5 feature map M with a square object inside.
We divide the square object equally into 3 × 3 regions. Now, we create a new feature map from
M to detect the top left (TL) corner of the square only. The new feature map looks like the one
on the right below. Only the yellow grid cell [2, 2] is activated.
KIETW-ECE Page 23
Image Segmentation using Region-based Object Detector
Create a new feature map from the left to detect the top left corner of an object.
Fig. 2.7 New feature map from the left to detect the top left corner of an object
Since we divide the square into 9 parts (top-left TR, top-middle TM, top-right TR, center-left
CF, …, bottom-right BR), we create 9 feature maps each detecting the corresponding region of
the object.
KIETW-ECE Page 24
Image Segmentation using Region-based Object Detector
These feature maps are called position-sensitive score maps because each map detects
Let’s say the dotted red rectangle below is the ROI proposed. We divide it into 3 × 3 regions
and ask how likely each region contains the corresponding part of the object. For example, how
likely the top-left ROI region contains the left eye. We store the results into a 3 × 3 vote array in
This process to map score maps and ROIs to the vote array is called position-sensitive ROI-pool
KIETW-ECE Page 25
Image Segmentation using Region-based Object Detector
We compute the average score of the top-left score map bounded by the top-left ROI (blue
rectangle). About 40% of the area inside the blue rectangle has 0 activation and 60% have 100%
activation, i.e. 0.6 in average. So the likelihood that we have detected the top-left object is 0.6.
We redo it with the top-middle ROI but with the top-middle score map now.
The result is computed as 0.55 and stored in array [0][1]. This value indicates the likelihood that
Overlay a portion of the ROI onto the corresponding score map to calculate V[i][j]
After calculating all the values for the position-sensitive ROI pool, the class score is the
KIETW-ECE Page 26
Image Segmentation using Region-based Object Detector
After calculating all the values for the position-sensitive ROI pool, the class score is the
average of all its elements. Let’s say we have C classes to detect. We expand it to C + 1 classes
Each class will have its own 3 × 3 score maps and therefore a total of (C+1) × 3 × 3
score maps. Using its own set of score maps, we predict a class score for each class. Then we
apply a softmax on those scores to compute the probability for each class. . Using its own set of
score maps, we predict a class score for each class. Then we apply a softmax on those scores to
Main problem is how to classify the overlap parts among several objects with the same
semantic label. segment each detected object is just output its corresponding mask as
KIETW-ECE Page 27
Image Segmentation using Region-based Object Detector
segmentation result. However, these masks are not accurate enough, they usually miss some
parts of the object. To address it, we introduce saliency detection method to refine these masks.
Saliency detection approach detect all the salient objects in the form of saliency map.
In computer vision, a saliency map is an image that shows each pixel's unique quality.
The goal of a saliency map is to simplify and/or change the representation of an image into
something that is more meaningful and easier to analyze. For example, if a pixel has a high grey
level or other unique color quality in a color image, that pixel's quality will show in the saliency
map and in an obvious way. Saliency is a kind of image segmentation. Saliency estimation may
process of partitioning a digital image into multiple segments (sets of pixels, also known as
image into something that is more meaningful and easier to analyze. Image segmentation is
typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely,
image segmentation is the process of assigning a label to every pixel in an image such that pixels
with the same label share certain characteristics. First, we should calculate the distance of each
pixel to the rest of pixels in the same frame: is the value of pixel , in the range of [0,255]. The
Where N is the total number of pixels in the current frame. Then we can further
restructure our formula. We put the value that has same I together.
Where Fn is the frequency of In. And the value of n belongs to [0,255]. The frequencies
are expressed in the form of histogram, and the computational time of histogram is time
complexity.
KIETW-ECE Page 28
Image Segmentation using Region-based Object Detector
Saliency is what stands out to you and how you are able to quickly focus on the most
relevant parts of what you see. In neuroscience, saliency is described as an attention mechanism
partitioning a digital image into multiple segments (sets of pixels, also known as superpixels).
The goal of segmentation is to simplify and/or change the representation of an image into
something that is more meaningful and easier to analyze. Image segmentation is typically used to
locate objects and boundaries (lines, curves, etc.) in images. The goal of segmentation is to
simplify and/or change the representation of an image into something that is more meaningful
and easier to analyze. More precisely, image segmentation is the process of assigning a label to
every pixel in an image such that pixels with the same label share certain characteristics.
In UX design, saliency is a feedback loop for understanding what parts of a design are
useful, and which are not. They use the information they gather from usability and eye tracking
studies this to design better interfaces. Advertisers are well aware that many people don’t have
long attention spans, hence they try to catch the eye of a user with a single glance. Saliency
detection methods are used to better design ads and posters. Advertisers are well aware that
KIETW-ECE Page 29
Image Segmentation using Region-based Object Detector
many people don’t have long attention spans, hence they try to catch the eye of a user with a
single glance. Saliency detection methods are used to better design ads and posters.
Saliency detection, essentially, can be used in any area in which you’re trying to
automate the process of understanding what stands out in an image. Saliency detection,
essentially, can be used in any area in which you’re trying to automate the process of
KIETW-ECE Page 30
Image Segmentation using Region-based Object Detector
We use saliency detection to make our algorithms smarter. One example of this would be
This microservice uses the Saliency Detector algorithm to get information about the
important parts of an image. Using Saliency Detection will make your app/service smarter by
detecting the relevant (salient) parts in your images automatically. You can use this information
Digital image processing deals with manipulation of digital images through a digital
computer. It is a subfield of signals and systems but focus particularly on images. DIP focuses on
developing a computer system that is able to perform processing on an image. The input of that
system is a digital image and the system process that image using efficient algorithms, and gives
KIETW-ECE Page 31
Image Segmentation using Region-based Object Detector
an image as an output. The most common example is Adobe Photoshop. It is one of the widely
In the above figure, an image has been captured by a camera and has been sent to a
digital system to remove all the other details, and just focus on the water drop by zooming it in
such a way that the quality of the image remains the same. The digital image processing deals with
What is an Image
mathematical function f (x, y) where x and y are the two co-ordinates horizontally and vertically.
The value of f (x, y) at any point is gives the pixel value at that point of an image. The above
figure is an example of digital image that you are now viewing on your computer screen.
The above figure is an example of digital image that you are now viewing on your
computer screen. But actually, this image is nothing but a two-dimensional array of numbers
KIETW-ECE Page 32
Image Segmentation using Region-based Object Detector
128 30 123
123 77 89
80 255 255
Each number represents the value of the function f (x, y) at any point. In this case the
value 128, 230 ,123 each represents an individual pixel value. The dimensions of the picture is
If the image is a two dimensional array then what does it have to do with a signal? In
KIETW-ECE Page 33
Image Segmentation using Region-based Object Detector
Signal:
In physical world, any quantity measurable through time over space or any higher
dimension can be taken as a signal. A signal is a mathematical function, and it conveys some
information.
A signal can be one dimensional or two dimensional or higher dimensional signal. One
dimensional signal is a signal that is measured over time. The common example is a voice signal.
The two dimensional signals are those that are measured over some other physical
quantities. The example of two-dimensional signal is a digital image. We will look in more detail
in the next tutorial of how a one dimensional or two dimensional signals and higher signals are
Relationship
between two observers is a signal. That includes speech or (human voice) or an image as a
signal. Since when we speak, our voice is converted to a sound wave/signal and transformed
with respect to the time to person we are speaking to. Not only this , but the way a digital camera
works, as while acquiring an image from a digital camera involves transfer of a signal from one
Since capturing an image from a camera is a physical process. The sunlight is used as a
source of energy. A sensor array is used for the acquisition of the image. So when the sunlight
falls upon the object, then the amount of light reflected by that object is sensed by the sensors,
KIETW-ECE Page 34
Image Segmentation using Region-based Object Detector
and a continuous voltage signal is generated by the amount of sensed data. In order to create a
digital image, we need to convert this data into a digital form. This involves sampling and
quantization. (They are discussed later on). The result of sampling and quantization results in a
two dimensional array or matrix of numbers which are nothing but a digital image.
Overlapping fields
Machine/Computer vision
Machine vision or computer vision deals with developing a system in which the input is
For example: Developing a system that scans human face and opens any kind of lock.
Fig. 2.15 Example of Developing a system that scans human face and opens any kind of lock
Computer graphics
Computer graphics deals with the formation of images from object models, rather then
the image is captured by some device. For example: Object rendering. Generating an image from
KIETW-ECE Page 35
Image Segmentation using Region-based Object Detector
For example: Object rendering. Generating an image from an object model. Such a
Artificial intelligence
Artificial intelligence is more or less the study of putting human intelligence into
machines. Artificial intelligence has many applications in image processing. For example:
developing computer aided diagnosis systems that help doctors in interpreting images of X-ray ,
MRI e.t.c and then highlighting conspicuous section to be examined by the doctor.
Signal processing
Signal processing is an umbrella and image processing lies under it. The amount of light
reflected by an object in the physical world (3d world) is pass through the lens of the camera and
it becomes a 2d signal and hence result in image formation. The amount of light reflected by an
object in the physical world is pass through the lens of the camera and it becomes a 2d signal and
hence result in image formation. This image is then digitized using methods of signal processing
KIETW-ECE Page 36
Image Segmentation using Region-based Object Detector
4.2 MATLAB
4.2.1: INTRODUCTION
matrix programming language where linear algebra programming was simple. It can be run both
under interactive sessions and as a batch job. This tutorial gives you aggressively a gentle
simple and easy way to make your learning fast and effective.
algorithms; creation of user interfaces; interfacing with programs written in other languages,
including C, C++, Java, and FORTRAN; analyze data; develop algorithms; and create models
and applications.
It has numerous built-in commands and math functions that help you in mathematical
KIETW-ECE Page 37
Image Segmentation using Region-based Object Detector
Linear Algebra
Algebraic Equations
Non-linear Functions
Statistics
Data Analysis
Numerical Calculations
Integration
Transforms
Curve Fitting
It also provides an interactive environment for iterative exploration, design and problem solving.
It provides vast library of mathematical functions for linear algebra, statistics, Fourier analysis,
KIETW-ECE Page 38
Image Segmentation using Region-based Object Detector
It provides built-in graphics for visualizing data and tools for creating custom plots.
MATLAB's programming interface gives development tools for improving code quality
It provides functions for integrating MATLAB based algorithms with external applications and
encompassing the fields of physics, chemistry, math and all engineering streams. It is used in a
Control Systems
Computational Finance
Computational Biology
KIETW-ECE Page 39
Image Segmentation using Region-based Object Detector
MathWorks provides the licensed product, a trial version and a student version as well.
You need to log into the site and wait a little for their approval.
After downloading the installer the software can be installed through few clicks.
KIETW-ECE Page 40
Image Segmentation using Region-based Object Detector
MATLAB development IDE can be launched from the icon created on the desktop. The
main working window in MATLAB is called the desktop. When MATLAB is started, the
Current Folder − This panel allows you to access the project folders and files.
KIETW-ECE Page 41
Image Segmentation using Region-based Object Detector
Command Window − This is the main area where commands can be entered at the command
Workspace − The workspace shows all the variables created and/or imported from files.
KIETW-ECE Page 42
Image Segmentation using Region-based Object Detector
Command History − This panel shows or return commands that are entered at the command
line.
KIETW-ECE Page 43
Image Segmentation using Region-based Object Detector
CHAPTER 5
Contour
CEDN Input Image
R-FCN
SCG
Masks
Contour Saliency
Segmentation result
KIETW-ECE Page 44
Image Segmentation using Region-based Object Detector
5.2 Algorithm:
First, we have to provide an image into the segmentation process through COUNTER
ENCODER and DECODER NETWORK (CEDN). In CEDN,standard network used for tasks
requiring dense pixel-wise predictions like semantic segmentation, computing optical flow and
disparity maps, and contour detection.Different variations of the encoder–decoder network have
used for image separation by the boundary or region. Simultaneously, SCG is technique helps to
convert object to masks. After that, masks are provided by contour as a part of image detection
by splitting the images as per the requirement, mostly like region. During, the mask generation,
the image is getting as per condition, when the pixel value is clear up to the mark
RFCN apples bounding boxes to produce accurate image contour. In further, the
segmentation results are occurred after completing the Contour Saliency, at where the image is
separated / get space between the collinear lines. All this will be done using a code through
MATLAB.
KIETW-ECE Page 45
Image Segmentation using Region-based Object Detector
5.3 OUTPUT:
Fig.4.2 Mask-2
KIETW-ECE Page 46
Image Segmentation using Region-based Object Detector
Fig.4.4 Mask-4
KIETW-ECE Page 47
Image Segmentation using Region-based Object Detector
KIETW-ECE Page 48
Image Segmentation using Region-based Object Detector
KIETW-ECE Page 49
Image Segmentation using Region-based Object Detector
5.4 ADVANTAGES
5. It uses bounding boxes which will make the segmentation process simple and accurate.
KIETW-ECE Page 50
Image Segmentation using Region-based Object Detector
5.5 APPLICATIONS
Semantic image segmentation, which becomes one of the key applications in image pro-
cessing and computer vision domain, has been used in multiple domains such as
KIETW-ECE Page 51
Image Segmentation using Region-based Object Detector
CONCLUSION
Compared with the traditional image semantic segmentation method, the method based
on the convolutional neural network is simple and the segmentation effect is better than the
traditional image semantic segmentation method. Model fusion helps to achieve high accuracy of
Image semantic segmentation is a key technology in the field of image processing and
computer vision. It is an important part of computer cognitive image content. The quality of
semantic segmentation plays a crucial role in subsequent tasks such as image understanding,
segmentation algorithm. With the continuous development of deep learning, the high accuracy
brought by neural networks has been widely studied and applied in many scenes such a simage
method based on region feature extraction, the image features acquired by the deep
convolutional neural network method havestronger representation ability, so the algorithm has
better effect. The basic idea of semantic segmentation based on deep convolutional neural
network is to extract the semantic features of each pixel in the image by using neural network,
then classify and identify the pixels according to these features, so as to obtain the segmentation
image containing semantic information. Therefore, the core of this method is how to improve the
KIETW-ECE Page 52
Image Segmentation using Region-based Object Detector
FUTURE SCOPE
"Ground truth" refers to information collected on location. Ground truth allows image data to
Ground truth also helps with atmospheric correction. Since images from satellites obviously have
to pass through the atmosphere, they can get distorted because of absorption in the atmosphere.
"Ground truth" means a set of measurements that is known to be much more accurate than
For example, suppose you are testing a stereo vision system to see how well it can estimate 3D
positions. The "ground truth" might be the positions given by a laser rangefinder which is known
2. It can be developed in End-to-End encryption to secure the Image Processing results and
improve the performance of the Object detection.In Future,we may expect,these improvements
on our project.
KIETW-ECE Page 53
Image Segmentation using Region-based Object Detector
APPENDIX
clc
close all
clear all
tic
hi=imread([pn,fn]);
[ro co]=size(hi);
[masked_image] = scg(im,1);
[masked_image2] = scg(im2,1);
[masked_image3] = scg(im3,1);
%%
%
figure,imshow(uint8(masked_image.*im));
title('Aligned Hierarchi1');
figure,imshow(uint8(masked_image2.*im2));
title('Aligned Hierarchi2');
figure,imshow(uint8(masked_image3.*im3));
title('Aligned Hierarchi3');
mh1=uint8(imresize(uint8(masked_image.*im),[ro co]));
mh2=uint8(imresize(uint8(masked_image2.*im2),[ro co]));
mh3=uint8(imresize(uint8(masked_image3.*im3),[ro co]));
mh=mh1+mh2+mh3;
figure,imshow(uint8(mh));
title('Region proposal generation opj')
%mcg ends
%CEDN starts
KIETW-ECE Page 54
Image Segmentation using Region-based Object Detector
im1=rgb2gray(hi);
im1=medfilt2(im1,[3 3]);
BW = edge(im1,'sobel');
[imx,imy]=size(BW);
msk=[0 0 0 0 0;
0 1 1 1 0;
0 1 1 1 0;
0 1 1 1 0;
0 0 0 0 0;];
B=conv2(double(BW),double(msk));
L = bwlabel(B,8);
mx=max(max(L));
op=hi;
B2=conv2(double(BW),double(msk));
L2 = bwlabel(B2,8);
mx2=max(max(L2));
B3=conv2(double(BW),double(msk));
L3 = bwlabel(B3,8);
mx3=max(max(L3));
[r,c] = find(L==17);
rc = [r c];
[sx sy]=size(rc);
n1=zeros(imx,imy);
for i=1:sx
x1=rc(i,1);
y1=rc(i,2);
n1(x1,y1)=255;
end
figure,imshow(B);
title('contour map');
%CEDN ends
imagen=op;
KIETW-ECE Page 55
Image Segmentation using Region-based Object Detector
[L Ne]=bwlabel(imagen);
propied=regionprops(L,'BoundingBox');
hold on
%% Plot Bounding Box
for n=1:size(propied,1)
rectangle('Position',propied(n).BoundingBox,'EdgeColor','g','LineWidth',2)
end
hold off
pause (1)
if isdir('networks')==0
mkdir('networks');
end
trainFcn = 'trainlm';
KIETW-ECE Page 56
Image Segmentation using Region-based Object Detector
r15(i)=regression(targets(tr.testInd), outputs);
r2016(i)=regression(targets16, outputs2016);
save(['networks\net' num2str(i)],'net');
end
img=mh;
dim = size(img);
width = dim(2);height = dim(1);
md = min(width, height);%minimum dimension
cform = makecform('srgb2lab');
lab = applycform(img,cform);
l = double(lab(:,:,1));
a = double(lab(:,:,2));
b = double(lab(:,:,3));
sm = zeros(height, width);
off1 = int32(md/2); off2 = int32(md/4); off3 = int32(md/8);
I=imgl;
I = imresize(I,[256,256]);
I = imadjust(I,stretchlim(I));
I_Otsu = im2bw(I,graythresh(I));
I_HIS = rgb2hsi(I);
cform = makecform('srgb2lab');
lab_he = applycform(I,cform);
ab = double(lab_he(:,:,2:3));
nrows = size(ab,1);
ncols = size(ab,2);
ab = reshape(ab,nrows*ncols,2);
nColors = 3;
KIETW-ECE Page 57
Image Segmentation using Region-based Object Detector
pixel_labels = reshape(cid,nrows,ncols);
segmented_images = cell(1,3);
rgb_label = repmat(pixel_labels,[1,1,3]);
for k = 1:nColors
colors = I;
colors(rgb_label ~= k) = 0;
segmented_images{k} = colors;
end
for j = 1:height
y11 = max(1,j-off1); y12 = min(j+off1,height);
y21 = max(1,j-off2); y22 = min(j+off2,height);
y31 = max(1,j-off3); y32 = min(j+off3,height);
for k = 1:width
x11 = max(1,k-off1); x12 = min(k+off1,width);
x21 = max(1,k-off2); x22 = min(k+off2,width);
x31 = max(1,k-off3); x32 = min(k+off3,width);
lm1 = mean2(l(y11:y12,x11:x12));am1 = mean2(a(y11:y12,x11:x12));bm1 =
mean2(b(y11:y12,x11:x12));
lm2 = mean2(l(y21:y22,x21:x22));am2 = mean2(a(y21:y22,x21:x22));bm2 =
mean2(b(y21:y22,x21:x22));
lm3 = mean2(l(y31:y32,x31:x32));am3 = mean2(a(y31:y32,x31:x32));bm3 =
mean2(b(y31:y32,x31:x32));
figure,imshow(img);
figure,imshow(sm,[]);
title('saliency map');
for k = 1:nColors
colors = I;
colors(rgb_label ~= k) = 0;
segmented_images{k} = colors;
KIETW-ECE Page 58
Image Segmentation using Region-based Object Detector
end
KIETW-ECE Page 59
Image Segmentation using Region-based Object Detector
REFERENCES
[1] H.G. Barrow and J.M. Tenenbaum. Computational vision. IEEE, 1981.
[2] S. Bileschi and L. Wolf. A unified system for object detection, texturerecognition, and
[3] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis.
PAMI, 2002.[4] N. Dalal and B. Triggs. Histograms of oriented gradients for humandetection.
InCVPR, 2005.[5] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid. Groups of adjacent contour
segments for object detection. PAMI, 2008.[6] M. Fink and P. Perona. Mutual boosting for
contextual inference. In NIPS, 2003.[7] Stephen Gould, Rick Fulton, and Daphne Koller.
J. J. Lim, P. Arbelaez, and J. Malik. Recognition using regions. InCVPR, 2009.[9] G. Heitz and
D. Koller. Learning spatial context: Using stuff to find things. InECCV, 2008.[10] G. Heitz, S.
Gould, A. Saxena, and D. Koller. Cascaded classification models: Combining models forholistic
scene understanding. InNIPS, 2008.[11] D. Hoiem, A. A. Efros, and M. Hebert. Closing the loop
on scene interpretation. CVPR, 2008.[12] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects
Yuen, and A. Torralba. Nonparametric scene parsing: Label transfer via dense scene alignment.
TextonBoost: Jointappearance, shape and contextmodeling for multi-class object recognition and
Describing visual scenes using transformed objectsand parts. InIJCV, 2007.[18] A. Torralba, K.
P. Murphy, W. T. Freeman, and M. A. Rubin. Context-based vision system for place andobject
KIETW-ECE Page 60
Image Segmentation using Region-based Object Detector
and W. Freeman. Contextual models for object detection using boosted randomfields. InNIPS,
2004.
[21] Z. Tu, X. Chen, A. L. Yuille, and S.-C. Zhu. Image parsing: Unifying segmentation,
detection, andrecognition. InICCV, 2003.[22] P. Viola and M. J. Jones. Robust real-time face
detection. IJCV, 2004.[23] C. Wojek and B. Schiele. A dynamic conditional random field model
KIETW-ECE Page 61