You are on page 1of 71

A PROJECT REPORT ON

IMAGE SEGMENTATION USING REGION-BASED


OBJECT DETECTOR
A project report submitted in partial fulfilment of the requirements for the award of
the degree of

BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted by

K. SATYA (18JN5A0430)
T. SUSHMI (17JN1A0452)
M. REVATHI (17JN1A0460)
A. AKHILA (17JN1A0457)
A. GEETHA SUREKHA (17JN1A0467)
Under the esteemed guidance of
Smt. S. ANITHA, M.Tech
Assistant professor, Dept. of ECE

KAKINADA INSTITUTE OF ENGINEERING AND TECHNOLOGY for


WOMEN
(Approved by AICTE, Affiliated to Jawaharlal Nehru Technological University
Kakinada, Yanam Road, Kornagi-533463)
KAKINADA INSTITUTE OF ENGINEERING AND TECHNOLOGY for
WOMEN
(Approved by AICTE, Affiliated to Jawaharlal Nehru Technological University
Kakinada, Yanam Road, Kornagi-533463)
(2017-2021)

CERTIFICATE
This is to certify that the thesis entitled “IMAGE SEGMENTATION USING
REGION-BASED OBJECT DETECTOR” is being submitted by K.SATYA,
T. SUSHMI, M. REVATHI, A. AKHILA, A. GEETHA SUREKHA has been
carried out in partial fulfilment of the requirement for the award of BACHELOR OF
TECHNOLOGY in ELECTRONICS AND COMMUNICATION
ENGINEERING from KAKINADA INSTITUE OF ENGINEERING AND
TECHNOLOGY for WOMEN affiliated to JNTU-KAKINADA is a record of
Bonafede work carried out by them under guidance and supervision. The results
embodied in this thesis has not been submitted to any other university or institute for
the award of Degree

Project Guide Head of the Department

Smt. S. Anitha, M. Tech, Ms. P. Latha, M. Tech,


Department of ECE Department of ECE

EXTERNAL EXAMINER
ACKNOWLEDGEMENT

It gives us immense pleasure to acknowledge all those who helped us throughout in


making this project a great success.
With profound gratitude we thank Mr. Y RAMA KRISHNA, M. Tech, MBA, Principal,
Kakinada Institute of Engineering and Technology-Women for his timely suggestions,
which helped us to complete this project work successfully.
Our sincere thanks and deep sense of gratitude to Ms. P. Latha, M. Tech Head of the
Department ECE, for his valuable guidance, in completion of this project successfully.
We express great pleasure to acknowledge my profound sense of gratitude to our project
guide Smt. S. Anitha, M. Tech, Assistant Professor in ECE Dept for this valuable
guidance, comments, suggestions and encouragement throughout the course of this
project.

We are thankful to both Teaching and Non-Teaching staff members of ECE department
for their kind cooperation and all sorts of help bringing out this project work successfully.

OUR PROJECT MEMBERS

K. SATYA (18JN5A0430)
T. SUSHMI (17JN1A0452)
M. REVATHI (17JN1A0460)
A. AKHILA (17JN1A0457)
A. GEETHA SUREKHA (17JN1A0467)
DECLARATION

We hereby declare that the project work “SEMATIC IMAGE


SEGMENTATION USING REGION-BASED OBJECT DETECTOR” submitted to
the JNTU Kakinada, is a record of an original work done by us under the guidance of
Smt. S. Anitha, M. Tech Asst. Professor in Electronics & Communication Engineering.
This project work submitted in partial fulfilment of the requirement for the award of the
degree of Bachelor of Technology in Electronics & Communication Engineering. The
results embodied in this project report have not been submitted to any other University
or Institute for the award of any degree or diploma.

This work has not been previously submitted to any other institution or University
for the award of any other degree or diploma.

OUR PROJECT MEMBERS

K. SATYA (18JN5A0430)
T. SUSHMI (17JN1A0452)
M. REVATHI (17JN1A0460)
A. AKHILA (17JN1A0457)
A. GEETHA SUREKHA (17JN1A0467)
ABSTRACT

Semantic image segmentation, which becomes one of the key applications in image

pro-cessing and computer vision domain, has been used in multiple domains such as medical

area and intelligent transportation. However, current state-of-the-art models use a separate

representation for each task making joint inference clumsy and leaving the classification of

many parts of the scene ambiguous. In this paper, we explore a simple semantic segmentation

approach using region-based object detector which only needs bounding box annotations. The

main idea is using object detector to classify region proposals and then applying saliency

detection method to segment such classified proposals.

I
TABLE OF CONTENTS

CONTENTS PAGE NO

CHAPTER 1: INTRODUCTION
1.1 Introduction 1
1.2 Convolution Networks 2
CHAPTER 2: LITERATURE SURVEY 4
CHAPTER 3: IMPLEMENTATION METHODS
3.1 Introduction 9
3.2 Segmentation approaches 10
3.2.1 Region based Semantic Segmentation 10
3.2.2 R-CNN (Regions with CNN feature) 11
3.2.3 Fully Convolutional Neural Network based
Semantic Segmentation 12
3.2.4 Weakly Supervised Semantic Segmentation 13
CHAPTER 4: INTRODUCTION TO IMAGE PROCESSING
& MATLAB
4.1 Introduction to Image processing 15
4.1.1 Region Proposal generation 16
4.1.1 (a) Contour Map Generation 17
4.1.1 (b) Convolutional Encoder – Decoder Network 20
4.1.2 Object detection 22
4.1.3 Object Segmentation 27
4.2 MAT Lab 37
4.2.1 Introduction 37
4.2.2 MAT Lab’s power of Computational Mathematics 37
4.2.3 Features of MAT Lab 38
4.2.4 Uses of MAT Lab 39
4.2.5 Environment Set Up 39

II
4.2.6 Understanding The MATLAB Environment 41

CHAPTER 5: RESULT AND DESCRIPTION


5.1 Flow Chart 44
5.2 Algorithm 45
5.3 Output 46
5.4 Advantages 50
5.5 Applications 51
CONCLUSION 52
FUTURE SCOPE 53
APPENDIX 54
REFERENCES 60

III
LIST OF FIGURES

S.NO Figure No Figure Name Page no

1 1.1 R-CNN Architecture 11

2 1.2 FCN Architecture 13

3 1.3 Weakly Supervised Segmentation 14

4 2.1 Proposed Block Diagram 15

5 2.2 Multiscale Combinatorial Grouping 17

6 2.3 Directions of Pixels 18

7 2.4 Types Of Contour Pixels 19

(a)Absolute Direction

(b)Relative Direction

(c)Types Of Contour Pixels(I, O, IO)

8 2.5 Flowchart Of Convolutional Encoder-

Decoder Network 21

9 2.6 5 x 5 Feature Map 23

10 2.7 New Feature Map From The Left To 24

Detect The Top Left Corner Of An

Abject

11 2.8 9 Score Map 25

12 2.9 Top-Middle Object 26

IV
13 2.10 ROI Poll 27

14 2.11 Saliency detection 30

15 2.12 Example Of Smart Thumbnail Algorithm 31

16 2.13 Example Of Digital Image Processing 32

17 2.14 Example Of Digital Image 33

18 2.15 Example Of Developing a System That 35

Scans Human Face And Opens Any

Kind Of Lock

19 2.16 Example Of Object Rendering 36

20 3.1 MathWorks Installer 40

21 3.2 Installing Pause 40

22 3.3 MATLAB Desktop 41

23 3.4 Current Folder 41

24 3.5 Command Window 42

25 3.6 Work Shape 42

26 3.7 Command History 43

27 4.1 Mask-1 46

28 4.2 Mask-2 46

29 4.3 Mask-3 47

30 4.4 Mask-4 47

V
31 4.5 Contour Map 48

32 4.6 Input Image With Bounding Boxes 48

33 4.7 Final Mask 49

34 4.8 Saliency Map 49

35 4.9 Segmented Image 50

V
Image Segmentation using Region-based Object Detector

CHAPTER – 1

INTRODUCTION

1.1 INTRODUCTION

Semantic image segmentation, also called pixel-level classification, is the task of

clustering parts of image together which belong to the same object class (Thoma2016).Two other

main image tasks are image level classification and detection. Classification means treating each

image as an identical category. Detection refers to object localization and recognition. Image

segmentation can be treated as pixel-level prediction because it classifies each pixel into its

category.

Moreover, there is a task named instance segmentation which joints detection and

segmentation together. Object detection is one of the great challenges of computer vision, having

received continuousattention since the birth of the field. The most common modernapproaches

scan the image forcandidate objects and score each one. This is typified by the sliding-window

object detection ap-proach, but is also true of most other detection schemes (such as centroid-

based meth-ods or boundary edge methods).

The most successful approaches combine cues frominside the object boundary (local

features) with cues from outside the object (contextual cues). Recent works are adopting a more

holistic approach by combining the output of mul-tiple vision tasks and are reminiscent of some

of the earliest work in computer vision. However, these recent works use a different

representationfor each subtask, forcing informationsharing to be done through awkward feature

mappings.

KIETW-ECE Page 1
Image Segmentation using Region-based Object Detector

Another difficulty with these approachesis that the subtask representations can be

inconsistent. For example, a bounding-box based objectdetector includes many pixels within

each candidate detection window that are not part of the ob-ject itself. Furthermore, multiple

overlapping candidatedetections contain many pixels in common. How these pixels should be

treated is ambiguous in such approaches. A model that uniquely iden-tifies each pixel is not only

more elegant, but is also more likely to produce reliable results since itencodes a bias of the true

world (i.e., a visible pixel belongs to only one object) Semantic segmentation is a very important

topic in computer vision due to its crucial contribution for image understanding. The task is to

assign every single pixel a specific category label, such as person, car, and so on, which could be

considered as a dense pixel classification problem. It predicts the label, location and shape of

each object, thus is also called object parsing in some references. And it can be applied in broad

potential applications, such as automatic driving, robot sensing, to name a few. Recently, great

progress has been explored in the area of semantic image segmentation due to the rise of deep

learning. Specifically, it mainly uses Deep convolutional neural networks (CNNs) to extract rich

hierarchical semantic feature which is a bottleneck the traditional methods suffering.

1.2 CONVOLUTION NETWORKS

CNNs is very effective for image classification problem, encouraged by this, scholars

start to apply CNNs to dense prediction problems. In 2015, Long et al. first proposed an end-to-

end fully convolutional network (FCN) for semantic segmentation. However, the obtained label

map is very coarse as can be seen in Fig, However, the obtained label map is very coarse as can

be seen in Fig, that is because multiple stages of convolution and pooling strides reduce the final

prediction typically by a factor of 32 in each dimension, such low-resolution result loses much of

the finer image structure.

KIETW-ECE Page 2
Image Segmentation using Region-based Object Detector

To overcome this, Noh et al. learn a multi-layer deconvolution network as an up-

sampling operation to increase the resolution of prediction maps. Chen et al. proposed DeepLab

which employs atrous (or dilated) convolutions to account for larger receptive fields without

downscaling the image.

Recently, the author made some new improvement and proposed DeepLab v3 which gets

state-of-the-art performance, thus is widely applied. Zheng et al. propose a new type of CNNs by

combining the strengths of CNNs and Conditional Random Fields (CRFs) to improve accuracy.

While the fully-connected CRF is time consuming, Chen et al. replaced it by bilateral filtering

with the domain transform.

Recently, more powerful approaches are proposed. ith the development of the Internet of

Things, more and more image data are collectedby various image sensors or video sensors.

Before using image data for more complex computervision tasks, we need to know what objects

are in the image and where they are located. Therefore, object detection has always been a hot

research direction in the field of computer vision, and itspurpose is to locate and classify objects

in images or videos. Object detection has been widely used inmany fields, including intelligent

traffic and human pose estimation. Traditional algorithms solve the detection problem for

images by finding foreground andbackground from the picture and then manually extracting

foreground features for classification. The algorithm of extracting the foreground can be divided

into static and dynamic according to the stateof the object. The static object detection algorithm

for images usually uses the background subtraction algorithm. The foreground is the part where

the pixel value varies greatly.

KIETW-ECE Page 3
Image Segmentation using Region-based Object Detector

CHAPTER 2

LITERATURE SURVEY

In Lin et al. present a novel multi-path refinement network called RefineNet that uses all

the available information during the down-sampling process to facilitate high-resolution

classification with the help of long-range residual connections.

In, Bertasius et al. introduced a simple, yet efficient Convolutional Random Walk

Network to address the issue of poor boundary localization. Although many effective methods

have been explored, it is still very challenging to obtain high-resolution segmentation results

especially near object boundaries.

Dai et al. as an extra supervision for training convolutional networks to segment semantic

regions. As we know, bounding box annotations can be obtained more easily than masks,

although they are less precise, their amount may help improve segmentation performance.

Similarly, Khoreva et al. proposed to recursively train a convnet such that outputs are

improved after each iteration by using bounding box annotations only. Another interesting work

is scribble supervision segmentation presented by Lin et al. Scribbles are very widely used in

interactive image segmentation and more user-friendly than bounding boxes. In, Bearman et al.

took a step towards stronger supervision for semantic segmentation by pointing. There are also

some other forms of weakly supervised method have been explored as well, such as eye tracks,

noisy web tags. All these approaches require much less annotation effort during training, but

their performances are far away from fully supervised techniques.

Our method inherits features from the sliding-window object detector works, such as

Torralba et al.and Dalal and Triggs, and the multi-class image segmentation work of Shotton et

al.We further incorporate into our model many novel ideas for improving object detection via

scene context. The innovative works that inspire ours include predicting camera viewpoint for

KIETW-ECE Page 4
Image Segmentation using Region-based Object Detector

estimating the real world size of object candidates, relating “things” (objects) to nearby “stuff”

(regions), co-occurrence of object classes, and general scene “gist”. Recent works go beyond

simple appearance-based context and show that holistic scene under-standing (both geometric

and more general) can significantly improve performance by combining related tasks. These

works use the output of one task (e.g., object detection) to provide features for other related tasks

(e.g., depth perception).While they are appealing in their simplicity, current models are not

tightly coupled and may result in incoherent outputs (e.g., the pixels ina bounding box identified

as “car” by the object detector, maybe labeled as “sky” by an image segmentation task). In our

method, all tasks use the same region-based representation which forces consistency between

variables. Intuitively this leads to more robust predictions. The decomposition of a scene into

regions to provide the basis for vision tasks exists in some scene parsing works.

Notably, Tu et al. describe an approach for identifying regions in thes cene. Their

approach has only been shown to be effective on text and faces, leaving much of theimage

unexplained. Sudderth et al. relate scenes, objects and parts in a single hierarchical framework,

but do not provide an exact segmentation of the image. Gould et al. provides a complete

description of the scene using dynamically evolving decompositions that explain every pixel

(both semantically and geometrically). However, the method cannot distinguish between

between foreground objects and often leaves them segmented into multiple dissimilar pieces.

Our work builds on this approach with the aim of classifying objects.

Other works attempt to integrate tasks such as object detection and multi-class image

segmentation into a single CRF model. However, these models either use a different

representation for object and non-object regions or rely on a pixel-level representation. The

former does not enforce label consistency between object bounding boxes and the underlying

pixels while the latter does not distinguish between adjacent objects of the same class. Recent

KIETW-ECE Page 5
Image Segmentation using Region-based Object Detector

work by Gu et al. also use regions for object detection instead of the traditional sliding-window

approach. However, unlike our method, they use a single over-segmentation of the image and

make the strong assumption that each segment represents a (probabilistically) recognizable

object part. Our method, on the other hand, assembles objects (and background regions) using

segments from multiple different over-segmentations. The multiple over-segmentations avoid

errors made by any one segmentation.

Furthermore, we incorporate background regions which allows us to eliminate large

portions of the image thereby reducing the number of component regions that need to be

considered for each object. Liu et al. use a non-parametric approach to image labeling by

warping a given image onto a large set of labeled images and then combining the results. This is

a very effective approach since it scales easily to a large number of classes. However, the

method does not attempt to understand the scene semantics. In particular, their method is unable

to break the scene into separate objects (e.g., a row of cars will be parsed as a single region) and

cannot capture combinations of classes not present in the training set. As a result, the approach

performs poorly on most foreground object classes.

In recent years, many algorithms have been proposed to address the problem of object

detection. The object detection algorithms based on deep learning can be divided into two-stage

detection algorithms and one-stage detection algorithms. The two-stage algorithm is to first

generate a region proposal, and then target the boundary box and category prediction of the

region proposal. Girshick et al. proposed the classic regions with convolutional neural networks

(CNN) features(R-CNN) to achieve excellent object detection accuracy by using a deep ConvNet

to classify object proposals, but it is very time-consuming. To solve this problem, Girshick et al.

proposed theupgraded version of R-CNN, Faster R-CNN, which innovatively used the region

proposal network (RPN) to directly classify the region proposal in the convolutional neural

KIETW-ECE Page 6
Image Segmentation using Region-based Object Detector

network, and achieved the end-to-end goal of the whole detection framework. He et al. proposed

Mask R-CNN on the basis of Faster R-CNN, which added a branch for semantic segmentation

tasks, and used detection tasks and segmentation tasks to extract image features to improve the

accuracy of detection. He et al. proposed spatial pyramid pooling networks (SPPNet) to generate

fixed-length representations. Kong et al. proposed Hyper Net, which combines the generation of

candidate regions with the detection taskto produce fewer candidate regions while ensuring a

higher recall rate. Cai and Vasconcelos proposed Cascade R-CNN to address the problem of

overfitting and quality mismatch. The one-stage detection algorithms do not need to select region

proposals, but use the regression to directly calculate the positioning box and object category,

which further reduce the running time. Redmon et al. proposed the you only look once (YOLO)

algorithm to meet the requirements of real-time detection, but the detection accuracy of small

objects is not high.

Liu et al. proposed the single shot multibox detector (SSD) algorithm to predict the

object from multiple feature maps, which largely solved the problem of small object detection.

Lin et al. proposed RetinaNet mainlyto solve the extremely imbalanced problem of one-stage

algorithm positive and negative samples anddifficult and easy samples. Zhang et al. proposed

the RefineDet method, which absorbed the advantages of the two-stage algorithm, so that the

one-stage detection algorithm can also have theaccuracy of the two-stage algorithm. Liu et al.

proposed RFBNet to use cavity convolution toimprove the receptive field. Shen et al. proposed

deeply supervised object detector (DSOD) torestart training neural networks for detection tasks,

and also introduced the idea of DenseNet, which greatly reduced the number of parameters. Law

and Deng proposed Cornernet to detectan object bounding box as a pair of keypoints using a

single convolution neural network. To furtherimprove on Cornernet, Duan et al. proposed

KIETW-ECE Page 7
Image Segmentation using Region-based Object Detector

CenterNet to detect each object as a triplet of keypoints.Tian et al. proposed fully convolutional

one-stage object detector (FCOS) to solve object detectionin a per-pixel prediction fashion.

KIETW-ECE Page 8
Image Segmentation using Region-based Object Detector

CHAPTER 3

IMPLEMENTATION METHODS

3.1 INTRODUCTION

Image segmentation is useful in many applications. It can identify the regions of interest

in a scene or annotate the data. We categorize the existing segmentation algorithm into region-

based segmentation, data clustering, and edge-base segmentation. Region-based segmentation

includes the seeded and unseeded region growing algorithms, the JSEG, and the fast scanning

algorithm. All of them expand each region pixel by pixel based on their pixel value or quantized

value so that each cluster has high positional relation. For data clustering, the concept of them is

based on the whole image and considers the distance between each data. The characteristic of

data clustering is that each pixel of a cluster does not certainly connective. For data clustering,

the concept of them is based on the whole image and considers the distance between each data.

The characteristic of data clustering is that each pixel of a cluster does not certainly connective.

The basis method of data clustering can be divided into hierarchical and partitional clustering.

Furthermore, we show the extension of data clustering called mean shift algorithm, although this

algorithm much belonging to density estimation. The last classification of segmentation is edge-

based segmentation. This type of the segmentations generally applies edge detection or the

concept of edge. The typical one is the watershed algorithm, but it always has the over-

segmentation problem, so that the use of markers was proposed to improve the watershed

algorithm by smoothing and selecting markers. Finally, we show some applications applying

segmentation technique in the pre processing.

KIETW-ECE Page 9
Image Segmentation using Region-based Object Detector

3.2 SEGMENTATION APPROACHES

A general semantic segmentation architecture can be broadly thought of as an encoder

network followed by a decoder network:

The encoder is usually is a pre-trained classification network like VGG/ResNet followed by a

decoder network.

The task of the decoder is to semantically project the discriminative features (lower resolution)

learnt by the encoder onto the pixel space (higher resolution) to get a dense classification.

Unlike classification where the end result of the very deep network is the only important

thing, semantic segmentation not only requires discrimination at pixel level but also a

mechanism to project the discriminative features learnt at different stages of the encoder onto the

pixel space. Different approaches employ different mechanisms as a part of the decoding

mechanism. Let’s explore the 3 main approaches

3.2.1   REGION-BASED SEMANTIC SEGMENTATION

The region-based methods generally follow the “segmentation using recognition”

pipeline, which first extracts free-form regions from an image and describes them, followed by

region-based classification. At test time, the region-based predictions are transformed to pixel

predictions, usually by labeling a pixel according to the highest scoring region that contains it.

The region-based methods generally follow the segmentation using recognition pipeline, which

first extracts free-form regions from an image and describes them, followed by region-based

classification. At test time, the region-based predictions are transformed to pixel predictions,

usually by labeling a pixel according to the highest scoring region that contains it.

KIETW-ECE Page 10
Image Segmentation using Region-based Object Detector

3.2.2 R-CNN (REGIONS WITH CNN FEATURE)

It is one representative work for the region-based methods. It performs the semantic

segmentation based on the object detection results. To be specific, R-CNN first utilizes selective

search to extract a large quantity of object proposals and then computes CNN features for each

of them.

Fig.1.1 R-CNN Architecture

Finally, it classifies each region using the class-specific linear SVMs. Compared with

traditional CNN structures which are mainly intended for image classification, R-CNN can

address more complicated tasks, such as object detection and image segmentation, and it even

becomes one important basis for both fields. Moreover, R-CNN can be built on top of any CNN

benchmark structures, such as AlexNet, VGG, GoogLeNet, and ResNet.

For the image segmentation task, R-CNN extracted 2 types of features for each region:

full region feature and foreground feature, and found that it could lead to better performance

when concatenating them together as the region feature. R-CNN achieved significant

performance improvements due to using the highly discriminative CNN features. However, it

also suffers from a couple of drawbacks for the segmentation task:

KIETW-ECE Page 11
Image Segmentation using Region-based Object Detector

The feature is not compatible with the segmentation task.

The feature does not contain enough spatial information for precise boundary generation.

Generating segment-based proposals takes time and would greatly affect the final performance.

Due to these bottlenecks, recent research has been proposed to address the problems, including

SDS, Hypercolumns, Mask R-CNN.

3.2.3  Fully Convolutional Network-Based Semantic Segmentation

The original Fully Convolutional Network (FCN) learns a mapping from pixels to pixels,

without extracting the region proposals. The FCN network pipeline is an extension of the

classical CNN. The main idea is to make the classical CNN take as input arbitrary-sized images.

The restriction of CNNs to accept and produce labels only for specific sized inputs comes from

the fully-connected layers which are fixed. The FCN network pipeline is an extension of the

classical CNN. The main idea is to make the classical CNN take as input arbitrary-sized images.

The restriction of CNNs to accept and produce labels only for specific sized inputs comes from

the fully-connected layers which are fixed.

Contrary to them, FCNs only have convolutional and pooling layers which give them the

ability to make predictions on arbitrary-sized inputs. One issue in this specific FCN is that by

propagating through several alternated convolutional and pooling layers, the resolution of the

output feature maps is down sampled. Contrary to them, FCNs only have convolutional and

pooling layers which give them the ability to make predictions on arbitrary-sized inputs. One

issue in this specific FCN is that by propagating through several alternated convolutional and

pooling layers, the resolution of the output feature maps is down sampled.

KIETW-ECE Page 12
Image Segmentation using Region-based Object Detector

Therefore, the direct predictions of FCN are typically in low resolution, resulting in

relatively fuzzy object boundaries.

Fig.1.2 FCN Architecture

A variety of more advanced FCN-based approaches have been proposed to address this

issue, including SegNet, DeepLab-CRF, and Dilated Convolutions.

3.2.4 WEAKLY SUPERVISED SEMANTIC SEGMENTATION

Most of the relevant methods in semantic segmentation rely on a large number of images

with pixel-wise segmentation masks. However, manually annotating these masks is quite time-

consuming, frustrating and commercially expensive.

Therefore, some weakly supervised methods have recently been proposed, which are

dedicated to fulfilling the semantic segmentation by utilizing annotated bounding boxes.

However, manually annotating these masks is quite time-consuming, frustrating and

commercially expensive.

KIETW-ECE Page 13
Image Segmentation using Region-based Object Detector

For example, Box sup employed the bounding box annotations as a supervision to train

the network and iteratively improve the estimated masks for semantic segmentation.

Fig. 1.3 Weakly Supervised Semantic Segmentation

Simple Does It treated the weak supervision limitation as an issue of input label noise and

explored recursive training as a de-noising strategy.

KIETW-ECE Page 14
Image Segmentation using Region-based Object Detector

CHAPTER 4

INTRODUCTION TO IMAGE PROCESSING AND MATLAB

4.1 INTRODUCTION TO IMAGE PROCESSING

Mainly, proposed method consists of

1. Region proposal generation,

2. Object detection, and

3. Object segmentation

Fig. 2.1 Proposed Block Diagram

This project has proposed our approach using object detector for semantic segmentation.

In detail, it includes region proposal generation, object detection, and object segmentation. We

first use proposal generator to get some object proposals and their corresponding masks. Then,

KIETW-ECE Page 15
Image Segmentation using Region-based Object Detector

we use region-based object detector to classify them to obtain their category labels. Finally, we

try to introduce saliency detection method to each object box to get their segmented results using

proposal masks as object seeds. The detailed process pipeline is shown in above figure.

4.1.1 REGION PROPOSAL GENERATION

Object proposals are very important mid-level representations, which providing

subsequent applications with a couple of image regions that objects might occur. And current top

performing object detectors all use region proposals, such as Faster R-CNN, R-FCN. Almost all

the object proposal generation methods could be split into two kinds: grouping based and sliding

window based. The first kind approaches can generate relatively high accurate object bounding

boxes and masks at the same time. Thus, this paper focuses this type. Experiments in show that

MCG (multiscale combinatorial grouping) gets the best performance among all low-level

feature-based proposal generators. Segments in MCG are merged based on contour strength. In

order to boost the performance of object proposals, we use powerful contour detection method

(Convolutional Encoder-Decoder Network, CEDN) based on CNNs to replace classic gPb

contour detector in MCG. In MCG, ultra-metric contour maps are computed from multiscale and

then aligned into a single hierarchical segmentation.

Multiscale Combinatorial Grouping

Consider a segmentation of the image into regions that partition its domainS={Si}i. A

segmentation hierarchy is a family of partitions {S∗, S1.., SL} such that: (1)S∗is the finest set of

super pixels, (2) SL is the complete domain, and (3) regions from coarse levels are unions of

regions from fine levels. A hierarchy where each levelSiis assigned a real-valued indexλican be

represented by a dendrogram, a region tree where the height of each node is its index.

KIETW-ECE Page 16
Image Segmentation using Region-based Object Detector

Furthermore, it can also be represented as an ultrametric contour map (UCM), an image

obtained by weighting the boundary of each pair of adjacent regions in the hierarchy by the

index at which theyare merged.

Fig.2.2 Multiscale Combinatorial Grouping

This representation unifies the problems of contour detection and hierarchical image

segmentation: a threshold at levelλiin the UCMproduces the segmentation Si.

Aligning Segmentation Hierarchies

In order to leverage multi-scale information, our ap-proach combines segmentation

hierarchies computedindependently at multiple image resolutions. How-ever, since subsampling

an image removes details and smooths away boundaries, the resulting UCMs are misaligned, as

illustrated in the second panel.

Hierarchy Alignment

We construct a multi-resolution pyramid with Nscales by subsampling /super sampling

the original image and applying our single-scale segmenter. In order to preserve thin structures

and details, we declare as set of possible boundary locations the Nscales by subsampling /super

sampling the original image and applying our single-scale segmenter finest super pixels in the

highest-resolution order to preserve thin structures and details, we declare as set of possible

boundary locations the finest super pixels in the highest-resolution.

KIETW-ECE Page 17
Image Segmentation using Region-based Object Detector

Multiscale Hierarchy

After alignment, we have a fixed set of boundary locations, and N strengths for each of

them, coming from the different scales. We formulate this problem as binary boundary

classification and train a classifier that combines these N features into a single probability of

boundary estimation.

4.1.1.a: CONTOUR MAP GENERATION

The implemented heuristic uses the fact that branches are often only a few pixels in

length and occur towards the middle of contours to make the assumption that the set of two

possible endpoints that are the farthest apart from one another (in terms of contour pixels) are

the two actual endpoints for a sub-contour.

Fig: 2.3 Directions of pixels

Experimentation with this heuristic showed that it produced correct results in nearly

every “well behaved” map.

KIETW-ECE Page 18
Image Segmentation using Region-based Object Detector

Running the segmented image through the thinning, Moore contour tracing, and end

point finding algorithms yields each contour in a vectorized form which can then be processed

further. As can be seen in Figure, which shows the results of these steps on the previously shown

segmented image, the results from this step are quite good.

Fig: 2.4 Types of contour pixels. (a) Absolute direction; (b) relative direction; (c) types of contour

pixels: inner corner pixel (I), outer corner pixel (O) and inner-outer cornerpixel (IO)

Contour Tracing Algorithms Let I be a binary digital image withM×Npixels, where the

coordinate of the top-leftmost pixel is (0, 0) and that of the bottom-rightmost pixel is (M−1,

N−1). InI, a pixel can be represented as P= (x, y), x=0, 1, 2, ···, M−1, y=0, 1, 2, ···, N−1. Most

contour-tracing algorithms use a tracer T(P,d)with absolute directional information

d∈{N,NE,NW,W,SW,S,SE,E,NE}, and they havethe following basic sequence:1.The tracer

starts contour tracing at the contour of an object after it saves the starting point alongwith its

initial direction.2.The tracer determines the next contour point using its specific rule of following

paths accordingto the adjacent pixels and then moves to the contour point and changes its

absolute direction.3.If the tracer reaches the start point, then the trace procedure is terminated.To

determine the next contour point, which may be a contour pixel or pixel corner, the tracerdetects

KIETW-ECE Page 19
Image Segmentation using Region-based Object Detector

the intensity of its adjacent pixelPrand the new absolute directiondrforPrby usingrelative

direction informationr∈ {f ront,f ront−le f t,le f t,rear−le f t,rear,rear−right,right,r∈{f ront−right}.

For example, if the absolute direction of the current tracerT (P, d) isN, the leftdirection of the

tracerdLe f tisW. Similarly, the left pixel of tracer PLe f tis (x−1, y). Figure a, b showthe

directional information of the tracer, and Figure 2c shows the different types of contour pixels.

Thecontour pixels can be classified into four types, namely straight line, inner corner pixel, outer

corner pixel and inner-outer corner pixel. In Figure c, “O” represents the outer corner, “I”

represents the inner corner and “IO” represents the inner-outer corner according to the local

pattern of the contour. In this study, we focus on a contour-tracing algorithm that is suitable for

cases involving arelatively small number of objects and that require real-time tracing, such as

augmented reality (AR) mixed reality (MR) and recognition image-based code in small-scale

images, e.g., a mobile computing environment. Hence, we first introduce and briefly describe the

conventional contour-tracing algorithms that are used in this environment and analyse their

tracing accuracy and characteristics.

4.1.1.b: CONVOLUTIONAL ENCODER–DECODER NETWORK

A convolutional encoder–decoder network is a standard network used for tasks requiring

dense pixel-wise predictions like semantic segmentation, computing optical flow and disparity

maps, and contour detection. The encoder in the network computes progressively higher-level

abstract features as the receptive fields in the encoder increase with the depth of the encoder. The

spatial resolution of the feature maps is reduced progressively via a down-sampling operation,

whereas the decoder computes feature maps of progressively increasing resolution via un-

pooling or up-sampling. The network has the ability not only to model features like shape or

appearance of different classes but also to model long-range spatial relationships.

KIETW-ECE Page 20
Image Segmentation using Region-based Object Detector

Different variations of the encoder–decoder network have been explored in the literature

for improved performance. Skip connections (Ronneberger et al., 2015) have been used to

recover the fine spatial details during reconstruction which get lost due to successive down-

sampling operations involved in the encoder. Addition of larger context information using

image-level features (Liu et al., 2015), recurrent connections (Pinheiro and Collobert, 2014;

Zheng et al., 2015), and larger convolutional kernels (Peng et al., 2017) has also significantly

improved the accuracy of semantic segmentation.

Fig.2.5 FLOWCHART OF CONVOLUTIONAL ENCODER–DECODER NETWORK

Other methods studied for improving semantic segmentation accuracy include hierarchical

supervision (Chen et al., 2016) and iterative concatenation of feature maps (Jégou et al., 2017).

KIETW-ECE Page 21
Image Segmentation using Region-based Object Detector

Our proposed up sampling idea was inspired by which is intended for unsupervised

feature learning. The fundamental aspects ofthe proposed encoder-decoder network are the

decoding process, which has numerous practical advantages regarding enhancing boundary

delineation and minimizing the total network size for enabling end-to-end training. The key

benefit of such a design is an easy to modify encoder-decoder architecture that can be adapted

and changed with very little modification. This encoder offers slow-resolution feature mapping

for pixel-wise classification. The feature maps produced through the convolution layer are

sparse, those later convolved using the decoder filters to generate detailed feature maps.

4.1.2 OBJECT DETECTION

Recognizing objects and localizing them is the key of our approach. Recent progress

shows that region-based object detectors achieve state-of-the-art performance. These methods

usually include the following parts: takes an image as input, extracts some region proposals,

computes semantic features for each proposal using CNNs, classifies each proposal to obtain

their semantic label. With these labels, we only need to segment each object to get final semantic

segmentation results. Furthermore, we can also get instance segmentation results which is a more

challenging task than semantic segmentation and is beyond this paper’s scope. R-FCN (Region-

based Fully Convolutional Networks) is a new baseline in recent object detection, which is very

efficient by using FCN and powerful by using Residual Networks (ResNets) for feature

extraction.

Region-Based Fully Convolutional Networks (R-FCN)

R-CNN based detectors, like Fast R-CNN or Faster R-CNN, process object detection in 2

stages.

Generate region proposals (ROIs), and Make classification and localization (boundary boxes)

KIETW-ECE Page 22
Image Segmentation using Region-based Object Detector

predictions from ROIs.

Fast R-CNN computes the feature maps from the whole image once. It then derives the region

proposals (ROIs) from the feature maps directly. For every ROI, no more feature extraction is

needed. That cuts down the process significantly as there are about 2000 ROIs. Following the

same logic, R-FCN improves speed by reducing the amount of work needed for each ROI. The

region-based feature maps are independent of ROIs and can be computed outside each ROI. The

remaining work, which we will discuss later, is much simpler and therefore R-FCN is faster than

Fast R-CNN or Faster R-CNN. Here is the pseudo code for R-FCN for comparison.

R-FCN

Fig. 2.6 5 x 5 feature map

Let’s get into the details and consider a 5 × 5 feature map M with a square object inside.

We divide the square object equally into 3 × 3 regions. Now, we create a new feature map from

M to detect the top left (TL) corner of the square only. The new feature map looks like the one

on the right below. Only the yellow grid cell [2, 2] is activated.

KIETW-ECE Page 23
Image Segmentation using Region-based Object Detector

Create a new feature map from the left to detect the top left corner of an object.

Fig. 2.7 New feature map from the left to detect the top left corner of an object

Since we divide the square into 9 parts (top-left TR, top-middle TM, top-right TR, center-left

CF, …, bottom-right BR), we create 9 feature maps each detecting the corresponding region of

the object.

KIETW-ECE Page 24
Image Segmentation using Region-based Object Detector

These feature maps are called position-sensitive score maps because each map detects

(scores) a sub-region of the object.

Generate 9 score maps

Let’s say the dotted red rectangle below is the ROI proposed. We divide it into 3 × 3 regions

and ask how likely each region contains the corresponding part of the object. For example, how

likely the top-left ROI region contains the left eye. We store the results into a 3 × 3 vote array in

the right diagram below.

Fig. 2.8 9 SCORE MAP

Apply ROI onto the feature maps to output a 3 x 3 array.

This process to map score maps and ROIs to the vote array is called position-sensitive ROI-pool

which is very similar to the ROI pool in the Fast R-CNN.

For the diagram below:

KIETW-ECE Page 25
Image Segmentation using Region-based Object Detector

We take the top-left ROI region, and

Map it to the top-left score map (top middle diagram).

We compute the average score of the top-left score map bounded by the top-left ROI (blue

rectangle). About 40% of the area inside the blue rectangle has 0 activation and 60% have 100%

activation, i.e. 0.6 in average. So the likelihood that we have detected the top-left object is 0.6.

We store the result (0.6) into array[0][0]

We redo it with the top-middle ROI but with the top-middle score map now.

The result is computed as 0.55 and stored in array [0][1]. This value indicates the likelihood that

we detected the top-middle object.

Fig. 2.9 Top-Middle Object

Overlay a portion of the ROI onto the corresponding score map to calculate V[i][j]

After calculating all the values for the position-sensitive ROI pool, the class score is the

average of all its elements.

KIETW-ECE Page 26
Image Segmentation using Region-based Object Detector

After calculating all the values for the position-sensitive ROI pool, the class score is the

average of all its elements. Let’s say we have C classes to detect. We expand it to C + 1 classes

so we include a new class for the background (non-object).

Fig. 2.10 ROI pool

Each class will have its own 3 × 3 score maps and therefore a total of (C+1) × 3 × 3

score maps. Using its own set of score maps, we predict a class score for each class. Then we

apply a softmax on those scores to compute the probability for each class. . Using its own set of

score maps, we predict a class score for each class. Then we apply a softmax on those scores to

compute the probability for each class.

4.1.3 OBJECT SEGMENTATION

Main problem is how to classify the overlap parts among several objects with the same

semantic label. segment each detected object is just output its corresponding mask as

KIETW-ECE Page 27
Image Segmentation using Region-based Object Detector

segmentation result. However, these masks are not accurate enough, they usually miss some

parts of the object. To address it, we introduce saliency detection method to refine these masks.

Saliency detection approach detect all the salient objects in the form of saliency map.

In computer vision, a saliency map is an image that shows each pixel's unique quality.

The goal of a saliency map is to simplify and/or change the representation of an image into

something that is more meaningful and easier to analyze. For example, if a pixel has a high grey

level or other unique color quality in a color image, that pixel's quality will show in the saliency

map and in an obvious way. Saliency is a kind of image segmentation. Saliency estimation may

be viewed as an instance of image segmentation. In computer vision, image segmentation is the

process of partitioning a digital image into multiple segments (sets of pixels, also known as

superpixels). The goal of segmentation is to simplify and/or change the representation of an

image into something that is more meaningful and easier to analyze. Image segmentation is

typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely,

image segmentation is the process of assigning a label to every pixel in an image such that pixels

with the same label share certain characteristics. First, we should calculate the distance of each

pixel to the rest of pixels in the same frame: is the value of pixel , in the range of [0,255]. The

following equation is the expanded form of this equation.

SALS(Ik) = |Ik - I1| + |Ik - I2| + ... + |Ik - IN|

Where N is the total number of pixels in the current frame. Then we can further

restructure our formula. We put the value that has same I together.

SALS(Ik) = ∑ Fn × |Ik - In|

Where Fn is the frequency of In. And the value of n belongs to [0,255]. The frequencies

are expressed in the form of histogram, and the computational time of histogram is time

complexity.

KIETW-ECE Page 28
Image Segmentation using Region-based Object Detector

π ={π (0),...,π (K)}; shortest path in the set

distance between a super pixel pair (i,j) as:

R (x,y) =boundary response at pixel (x,y), and l (i,j)

What is Saliency Detection

Saliency is what stands out to you and how you are able to quickly focus on the most

relevant parts of what you see. In neuroscience, saliency is described as an attention mechanism

in organisms to narrow down to the important parts of what they see.

Saliency is a kind of image segmentation. Saliency estimation may be viewed as an

instance of image segmentation. In computer vision, image segmentation is the process of

partitioning a digital image into multiple segments (sets of pixels, also known as superpixels).

The goal of segmentation is to simplify and/or change the representation of an image into

something that is more meaningful and easier to analyze. Image segmentation is typically used to

locate objects and boundaries (lines, curves, etc.) in images. The goal of segmentation is to

simplify and/or change the representation of an image into something that is more meaningful

and easier to analyze. More precisely, image segmentation is the process of assigning a label to

every pixel in an image such that pixels with the same label share certain characteristics.

In UX design, saliency is a feedback loop for understanding what parts of a design are

useful, and which are not. They use the information they gather from usability and eye tracking

studies this to design better interfaces. Advertisers are well aware that many people don’t have

long attention spans, hence they try to catch the eye of a user with a single glance. Saliency

detection methods are used to better design ads and posters. Advertisers are well aware that

KIETW-ECE Page 29
Image Segmentation using Region-based Object Detector

many people don’t have long attention spans, hence they try to catch the eye of a user with a

single glance. Saliency detection methods are used to better design ads and posters.

Fig. 2.11 Saliency Detection

Saliency detection, essentially, can be used in any area in which you’re trying to

automate the process of understanding what stands out in an image. Saliency detection,

essentially, can be used in any area in which you’re trying to automate the process of

understanding what stands out in an image.

KIETW-ECE Page 30
Image Segmentation using Region-based Object Detector

Why Saliency Detection

We use saliency detection to make our algorithms smarter. One example of this would be

the Smart Thumbnail Algorithm.

Fig. 2.12 Example of Smart Thumbnail Algorithm

This microservice uses the Saliency Detector algorithm to get information about the

important parts of an image. Using Saliency Detection will make your app/service smarter by

detecting the relevant (salient) parts in your images automatically. You can use this information

to improve your service, and make your app smarter!

DIGITAL IMAGE PROCESSING

Digital image processing deals with manipulation of digital images through a digital

computer. It is a subfield of signals and systems but focus particularly on images. DIP focuses on

developing a computer system that is able to perform processing on an image. The input of that

system is a digital image and the system process that image using efficient algorithms, and gives

KIETW-ECE Page 31
Image Segmentation using Region-based Object Detector

an image as an output. The most common example is Adobe Photoshop. It is one of the widely

used application for processing digital images.

Fig. 2.13 Example of Digital image processing

In the above figure, an image has been captured by a camera and has been sent to a

digital system to remove all the other details, and just focus on the water drop by zooming it in

such a way that the quality of the image remains the same. The digital image processing deals with

developing a digital system that performs operations on a digital image.

What is an Image

An image is nothing more than a two dimensional signal. It is defined by the

mathematical function f (x, y) where x and y are the two co-ordinates horizontally and vertically.

The value of f (x, y) at any point is gives the pixel value at that point of an image. The above

figure is an example of digital image that you are now viewing on your computer screen.

The above figure is an example of digital image that you are now viewing on your

computer screen. But actually, this image is nothing but a two-dimensional array of numbers

ranging between 0 and 255.

KIETW-ECE Page 32
Image Segmentation using Region-based Object Detector

128 30 123

232 123 321

123 77 89

80 255 255

Fig. 2.14 Example of Digital Image

Each number represents the value of the function f (x, y) at any point. In this case the

value 128, 230 ,123 each represents an individual pixel value. The dimensions of the picture is

actually the dimensions of this two dimensional array.

Relationship between a digital image and a signal

If the image is a two dimensional array then what does it have to do with a signal? In

order to understand that, we need to first understand what is a signal?

KIETW-ECE Page 33
Image Segmentation using Region-based Object Detector

Signal:

In physical world, any quantity measurable through time over space or any higher

dimension can be taken as a signal. A signal is a mathematical function, and it conveys some

information.

A signal can be one dimensional or two dimensional or higher dimensional signal. One

dimensional signal is a signal that is measured over time. The common example is a voice signal.

The two dimensional signals are those that are measured over some other physical

quantities. The example of two-dimensional signal is a digital image. We will look in more detail

in the next tutorial of how a one dimensional or two dimensional signals and higher signals are

formed and interpreted.

Relationship

Since anything that conveys information or broadcast a message in physical world

between two observers is a signal. That includes speech or (human voice) or an image as a

signal. Since when we speak, our voice is converted to a sound wave/signal and transformed

with respect to the time to person we are speaking to. Not only this , but the way a digital camera

works, as while acquiring an image from a digital camera involves transfer of a signal from one

part of the system to the other.

How a digital image is formed

Since capturing an image from a camera is a physical process. The sunlight is used as a

source of energy. A sensor array is used for the acquisition of the image. So when the sunlight

falls upon the object, then the amount of light reflected by that object is sensed by the sensors,

KIETW-ECE Page 34
Image Segmentation using Region-based Object Detector

and a continuous voltage signal is generated by the amount of sensed data. In order to create a

digital image, we need to convert this data into a digital form. This involves sampling and

quantization. (They are discussed later on). The result of sampling and quantization results in a

two dimensional array or matrix of numbers which are nothing but a digital image.

Overlapping fields

Machine/Computer vision

Machine vision or computer vision deals with developing a system in which the input is

an image and the output is some information.

For example: Developing a system that scans human face and opens any kind of lock.

This system would look something like this.

Fig. 2.15 Example of Developing a system that scans human face and opens any kind of lock

Computer graphics

Computer graphics deals with the formation of images from object models, rather then

the image is captured by some device. For example: Object rendering. Generating an image from

an object model. Such a system would look something like this.

KIETW-ECE Page 35
Image Segmentation using Region-based Object Detector

For example: Object rendering. Generating an image from an object model. Such a

system would look something like this.

Fig.2.16 Example of Object rendering.

Artificial intelligence

Artificial intelligence is more or less the study of putting human intelligence into

machines. Artificial intelligence has many applications in image processing. For example:

developing computer aided diagnosis systems that help doctors in interpreting images of X-ray ,

MRI e.t.c and then highlighting conspicuous section to be examined by the doctor.

Signal processing

Signal processing is an umbrella and image processing lies under it. The amount of light

reflected by an object in the physical world (3d world) is pass through the lens of the camera and

it becomes a 2d signal and hence result in image formation. The amount of light reflected by an

object in the physical world is pass through the lens of the camera and it becomes a 2d signal and

hence result in image formation. This image is then digitized using methods of signal processing

and then this digital image is manipulated in digital image processing.

KIETW-ECE Page 36
Image Segmentation using Region-based Object Detector

4.2 MATLAB

4.2.1: INTRODUCTION

MATLAB is a programming language developed by MathWorks. It started out as a

matrix programming language where linear algebra programming was simple. It can be run both

under interactive sessions and as a batch job. This tutorial gives you aggressively a gentle

introduction of MATLAB programming language. It is designed to give students fluency in

MATLAB programming language. Problem-based MATLAB examples have been given in

simple and easy way to make your learning fast and effective.

MATLAB is developed by MathWorks.

It allows matrix manipulations; plotting of functions and data; implementation of

algorithms; creation of user interfaces; interfacing with programs written in other languages,

including C, C++, Java, and FORTRAN; analyze data; develop algorithms; and create models

and applications.

It has numerous built-in commands and math functions that help you in mathematical

calculations, generating plots, and performing numerical methods.

4.2.2 MATLAB'S POWER OF COMPUTATIONAL MATHEMATICS

MATLAB is used in every facet of computational mathematics. Following are some

commonly used mathematical calculations where it is used most commonly −

Dealing with Matrices and Arrays

2-D and 3-D Plotting and graphics

KIETW-ECE Page 37
Image Segmentation using Region-based Object Detector

Linear Algebra

Algebraic Equations

Non-linear Functions

Statistics

Data Analysis

Calculus and Differential Equations

Numerical Calculations

Integration

Transforms

Curve Fitting

Various other special functions

4.2.3 FEATURES OF MATLAB

Following are the basic features of MATLAB −

It is a high-level language for numerical computation, visualization and application development.

It also provides an interactive environment for iterative exploration, design and problem solving.

It provides vast library of mathematical functions for linear algebra, statistics, Fourier analysis,

filtering, optimization, numerical integration and solving ordinary differential equations.

KIETW-ECE Page 38
Image Segmentation using Region-based Object Detector

It provides built-in graphics for visualizing data and tools for creating custom plots.

MATLAB's programming interface gives development tools for improving code quality

maintainability and maximizing performance.

It provides tools for building applications with custom graphical interfaces.

It provides functions for integrating MATLAB based algorithms with external applications and

languages such as C, Java, .NET and Microsoft Excel.

4.2.4 USES OF MATLAB

MATLAB is widely used as a computational tool in science and engineering

encompassing the fields of physics, chemistry, math and all engineering streams. It is used in a

range of applications including −

Signal Processing and Communications

Image and Video Processing

Control Systems

Test and Measurement

Computational Finance

Computational Biology

4.2.5 Environment Setup

Setting up MATLAB environment is a matter of few clicks. The installer can be

downloaded from here.

KIETW-ECE Page 39
Image Segmentation using Region-based Object Detector

MathWorks provides the licensed product, a trial version and a student version as well.

You need to log into the site and wait a little for their approval.

After downloading the installer the software can be installed through few clicks.

Fig.3.1 MathWorks Installer

Fig. 3.2 Installing Pause

KIETW-ECE Page 40
Image Segmentation using Region-based Object Detector

4.2.6 Understanding the MATLAB Environment

MATLAB development IDE can be launched from the icon created on the desktop. The

main working window in MATLAB is called the desktop. When MATLAB is started, the

desktop appears in its default layout −

Fig. 3.3 MATLAB desk top

The desktop has the following panels −

Current Folder − This panel allows you to access the project folders and files.

Fig. 3.4 Current Folder

KIETW-ECE Page 41
Image Segmentation using Region-based Object Detector

Command Window − This is the main area where commands can be entered at the command

line. It is indicated by the command prompt (>>).

Fig. 3.5 Command Window

Workspace − The workspace shows all the variables created and/or imported from files.

Fig. 3.6 Work Shape

KIETW-ECE Page 42
Image Segmentation using Region-based Object Detector

Command History − This panel shows or return commands that are entered at the command

line.

Fig. 3.7 Command History

KIETW-ECE Page 43
Image Segmentation using Region-based Object Detector

CHAPTER 5

RESULT AND DESCRIPTION

5.1 FLOW CHART

Contour
CEDN Input Image

R-FCN
SCG

Masks

Contour Saliency

Segmentation result

KIETW-ECE Page 44
Image Segmentation using Region-based Object Detector

5.2 Algorithm:

First, we have to provide an image into the segmentation process through COUNTER

ENCODER and DECODER NETWORK (CEDN). In CEDN,standard network used for tasks

requiring dense pixel-wise predictions like semantic segmentation, computing optical flow and

disparity maps, and contour detection.Different variations of the encoder–decoder network have

been explored in the literature for improved performance.

Then, the image is processing to contour, which is an important segmentation technique

used for image separation by the boundary or region. Simultaneously, SCG is technique helps to

convert object to masks. After that, masks are provided by contour as a part of image detection

by splitting the images as per the requirement, mostly like region. During, the mask generation,

the image is getting as per condition, when the pixel value is clear up to the mark

RFCN apples bounding boxes to produce accurate image contour. In further, the

segmentation results are occurred after completing the Contour Saliency, at where the image is

separated / get space between the collinear lines. All this will be done using a code through

MATLAB.

KIETW-ECE Page 45
Image Segmentation using Region-based Object Detector

5.3 OUTPUT:

Fig. 4.1 Mask-1

Fig.4.2 Mask-2

KIETW-ECE Page 46
Image Segmentation using Region-based Object Detector

Fig. 4.3 Mask-3

Fig.4.4 Mask-4

KIETW-ECE Page 47
Image Segmentation using Region-based Object Detector

Fig. 4.5 Contour Map

Fig. 4.6 Input Image With Bounding Boxes

KIETW-ECE Page 48
Image Segmentation using Region-based Object Detector

Fig. 4.7 Final Mask

Fig. 4.8 Saliency Map

KIETW-ECE Page 49
Image Segmentation using Region-based Object Detector

Fig. 4.9 Segmented Image

5.4 ADVANTAGES

1. It consumes less Time-consuming.

2. It is easy to construct when compared to previous method.

3. It produces enough accurate segmentation mask.

4. It is not expensive to obtain

5. It uses bounding boxes which will make the segmentation process simple and accurate.

KIETW-ECE Page 50
Image Segmentation using Region-based Object Detector

5.5 APPLICATIONS

Semantic image segmentation, which becomes one of the key applications in image pro-

cessing and computer vision domain, has been used in multiple domains such as

1. Medical area for segmenting tumour size.

2. Medical area for segmenting wound size.

3. Robotics for segmenting bombs.

4. Robotics for segmenting water.

5. Robotics for segmenting hills area.

6. Intelligent transportation for segmenting individual vehicles for counting.

KIETW-ECE Page 51
Image Segmentation using Region-based Object Detector

CONCLUSION

Compared with the traditional image semantic segmentation method, the method based

on the convolutional neural network is simple and the segmentation effect is better than the

traditional image semantic segmentation method. Model fusion helps to achieve high accuracy of

small objects while still achieving high global accuracy.

Image semantic segmentation is a key technology in the field of image processing and

computer vision. It is an important part of computer cognitive image content. The quality of

semantic segmentation plays a crucial role in subsequent tasks such as image understanding,

scene analysis and target tracking.

Therefore, it is of great practical significance to study an effective image semantic

segmentation algorithm. With the continuous development of deep learning, the high accuracy

brought by neural networks has been widely studied and applied in many scenes such a simage

recognition and semantic segmentation. Comparedwith the traditional semantic segmentation

method based on region feature extraction, the image features acquired by the deep

convolutional neural network method havestronger representation ability, so the algorithm has

better effect. The basic idea of semantic segmentation based on deep convolutional neural

network is to extract the semantic features of each pixel in the image by using neural network,

then classify and identify the pixels according to these features, so as to obtain the segmentation

image containing semantic information. Therefore, the core of this method is how to improve the

recognition accuracy of pixels on the network.

KIETW-ECE Page 52
Image Segmentation using Region-based Object Detector

FUTURE SCOPE

1. Our project didn't provide Instance Ground Truth.

"Ground truth" refers to information collected on location. Ground truth allows image data to

be related to real features and materials on the ground.

Ground truth also helps with atmospheric correction. Since images from satellites obviously have

to pass through the atmosphere, they can get distorted because of absorption in the atmosphere.

So ground truth can help fully identify objects in satellite photos.

"Ground truth" means a set of measurements that is known to be much more accurate than

measurements from the system you are testing.

For example, suppose you are testing a stereo vision system to see how well it can estimate 3D

positions. The "ground truth" might be the positions given by a laser rangefinder which is known

to be much more accurate than the camera system.

2. It can be developed in End-to-End encryption to secure the Image Processing results and

improve the performance of the Object detection.In Future,we may expect,these improvements

on our project.

KIETW-ECE Page 53
Image Segmentation using Region-based Object Detector

APPENDIX
clc
close all
clear all

[fn pn]=uigetfile('*.*','select ip image');

tic

hi=imread([pn,fn]);

[ro co]=size(hi);

%1.Region proposal generation


imgl=hi;
%mcg starts
im = double((imgl));
im2 = double(imresize((imgl),[512 512]));
im3 = double(imresize((imgl),[825 825]));

[masked_image] = scg(im,1);
[masked_image2] = scg(im2,1);
[masked_image3] = scg(im3,1);
%%
%
figure,imshow(uint8(masked_image.*im));
title('Aligned Hierarchi1');
figure,imshow(uint8(masked_image2.*im2));
title('Aligned Hierarchi2');
figure,imshow(uint8(masked_image3.*im3));
title('Aligned Hierarchi3');

mh1=uint8(imresize(uint8(masked_image.*im),[ro co]));
mh2=uint8(imresize(uint8(masked_image2.*im2),[ro co]));
mh3=uint8(imresize(uint8(masked_image3.*im3),[ro co]));

mh=mh1+mh2+mh3;

figure,imshow(uint8(mh));
title('Region proposal generation opj')

%mcg ends

%CEDN starts

KIETW-ECE Page 54
Image Segmentation using Region-based Object Detector

im1=rgb2gray(hi);
im1=medfilt2(im1,[3 3]);
BW = edge(im1,'sobel');
[imx,imy]=size(BW);
msk=[0 0 0 0 0;
0 1 1 1 0;
0 1 1 1 0;
0 1 1 1 0;
0 0 0 0 0;];

B=conv2(double(BW),double(msk));
L = bwlabel(B,8);
mx=max(max(L));

op=hi;
B2=conv2(double(BW),double(msk));
L2 = bwlabel(B2,8);
mx2=max(max(L2));

B3=conv2(double(BW),double(msk));
L3 = bwlabel(B3,8);
mx3=max(max(L3));

[r,c] = find(L==17);
rc = [r c];
[sx sy]=size(rc);
n1=zeros(imx,imy);
for i=1:sx
x1=rc(i,1);
y1=rc(i,2);
n1(x1,y1)=255;
end

figure,imshow(B);
title('contour map');

%CEDN ends

imagen=op;

KIETW-ECE Page 55
Image Segmentation using Region-based Object Detector

if size(imagen,3)==3 % RGB image


imagen=rgb2gray(imagen);
end
threshold = graythresh(imagen);
imagen =~im2bw(imagen,threshold);
imagen = bwareaopen(imagen,30);
pause(1)
figure,
imshow(~imagen);
title('INPUT IMAGE WITH Bounding Boxes')

[L Ne]=bwlabel(imagen);

propied=regionprops(L,'BoundingBox');
hold on
%% Plot Bounding Box
for n=1:size(propied,1)
rectangle('Position',propied(n).BoundingBox,'EdgeColor','g','LineWidth',2)
end
hold off
pause (1)

if isdir('networks')==0
mkdir('networks');
end

inputs=dlmread('Inputs1.txt', '\t', 1, 0);


targets=dlmread('Targets1.txt', '\t', 1, 0);
inputs2=dlmread('Inputs2.txt', '\t', 1, 0);
targets16=dlmread('Targets2.txt', '\t', 1, 0);
inputs = inputs';
targets = targets';
inputs2 = inputs2';
targets16 = targets16';

trainFcn = 'trainlm';

for i=1:2 %vary number of hidden layer neurons from 1 to 100


hiddenLayerSize = i; %number of hidden layer neurons
net = fitnet(hiddenLayerSize,trainFcn); %create a fitting network
net.divideParam.trainRatio = 70/100; %use 70% of data for training
net.divideParam.valRatio = 15/100; %15% for validation
net.divideParam.testRatio = 15/100; %15% for testing

KIETW-ECE Page 56
Image Segmentation using Region-based Object Detector

[net,tr] = train(net,inputs,targets); % train the network


outputs = net(inputs(:,tr.testInd)); %simulate 15% test data
outputs2016 = net(inputs2);
rmse15(i)=sqrt(mean((outputs-targets(tr.testInd)).^2));

r15(i)=regression(targets(tr.testInd), outputs);
r2016(i)=regression(targets16, outputs2016);
save(['networks\net' num2str(i)],'net');
end

img=mh;
dim = size(img);
width = dim(2);height = dim(1);
md = min(width, height);%minimum dimension

cform = makecform('srgb2lab');
lab = applycform(img,cform);
l = double(lab(:,:,1));
a = double(lab(:,:,2));
b = double(lab(:,:,3));
sm = zeros(height, width);
off1 = int32(md/2); off2 = int32(md/4); off3 = int32(md/8);

I=imgl;
I = imresize(I,[256,256]);

I = imadjust(I,stretchlim(I));

I_Otsu = im2bw(I,graythresh(I));
I_HIS = rgb2hsi(I);

cform = makecform('srgb2lab');
lab_he = applycform(I,cform);

ab = double(lab_he(:,:,2:3));
nrows = size(ab,1);
ncols = size(ab,2);
ab = reshape(ab,nrows*ncols,2);
nColors = 3;

[cid cce] = contoursailencymerg(ab,nColors);

KIETW-ECE Page 57
Image Segmentation using Region-based Object Detector

pixel_labels = reshape(cid,nrows,ncols);
segmented_images = cell(1,3);
rgb_label = repmat(pixel_labels,[1,1,3]);

for k = 1:nColors
colors = I;
colors(rgb_label ~= k) = 0;
segmented_images{k} = colors;
end

for j = 1:height
y11 = max(1,j-off1); y12 = min(j+off1,height);
y21 = max(1,j-off2); y22 = min(j+off2,height);
y31 = max(1,j-off3); y32 = min(j+off3,height);
for k = 1:width
x11 = max(1,k-off1); x12 = min(k+off1,width);
x21 = max(1,k-off2); x22 = min(k+off2,width);
x31 = max(1,k-off3); x32 = min(k+off3,width);
lm1 = mean2(l(y11:y12,x11:x12));am1 = mean2(a(y11:y12,x11:x12));bm1 =
mean2(b(y11:y12,x11:x12));
lm2 = mean2(l(y21:y22,x21:x22));am2 = mean2(a(y21:y22,x21:x22));bm2 =
mean2(b(y21:y22,x21:x22));
lm3 = mean2(l(y31:y32,x31:x32));am3 = mean2(a(y31:y32,x31:x32));bm3 =
mean2(b(y31:y32,x31:x32));

cv1 = (l(j,k)-lm1).^2 + (a(j,k)-am1).^2 + (b(j,k)-bm1).^2;


cv2 = (l(j,k)-lm2).^2 + (a(j,k)-am2).^2 + (b(j,k)-bm2).^2;
cv3 = (l(j,k)-lm3).^2 + (a(j,k)-am3).^2 + (b(j,k)-bm3).^2;
sm(j,k) = cv1 + cv2 + cv3;
end
end

figure,imshow(img);

figure,imshow(sm,[]);

title('saliency map');

for k = 1:nColors
colors = I;
colors(rgb_label ~= k) = 0;
segmented_images{k} = colors;

KIETW-ECE Page 58
Image Segmentation using Region-based Object Detector

end

figure, subplot(3,1,1);imshow(segmented_images{1});title('Segment 1');


subplot(3,1,2);imshow(segmented_images{2});title('Segment 2');
subplot(3,1,3);imshow(segmented_images{3});title('Segment 3');
set(gcf, 'Position', get(0,'Screensize'));

KIETW-ECE Page 59
Image Segmentation using Region-based Object Detector

REFERENCES

[1] H.G. Barrow and J.M. Tenenbaum. Computational vision. IEEE, 1981.

[2] S. Bileschi and L. Wolf. A unified system for object detection, texturerecognition, and

context analysisbased on the standard model feature set. InBMVC, 2005.

[3] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis.

PAMI, 2002.[4] N. Dalal and B. Triggs. Histograms of oriented gradients for humandetection.

InCVPR, 2005.[5] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid. Groups of adjacent contour

segments for object detection. PAMI, 2008.[6] M. Fink and P. Perona. Mutual boosting for

contextual inference. In NIPS, 2003.[7] Stephen Gould, Rick Fulton, and Daphne Koller.

Decompsing a sceneinto geometric and semanticallyconsistent regions. InICCV, 2009.[8] C. Gu,

J. J. Lim, P. Arbelaez, and J. Malik. Recognition using regions. InCVPR, 2009.[9] G. Heitz and

D. Koller. Learning spatial context: Using stuff to find things. InECCV, 2008.[10] G. Heitz, S.

Gould, A. Saxena, and D. Koller. Cascaded classification models: Combining models forholistic

scene understanding. InNIPS, 2008.[11] D. Hoiem, A. A. Efros, and M. Hebert. Closing the loop

on scene interpretation. CVPR, 2008.[12] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects

in perspective. IJCV, 2008.[13] B. Leibe, A. Leonardis, and B. Schiele. Combined object

categorization and segmentation with an implicitshape model. InECCV, 2004.[14] C. Liu, J.

Yuen, and A. Torralba. Nonparametric scene parsing: Label transfer via dense scene alignment.

In CVPR, 2009.[15] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie.

Objects in context. InICCV,2007.[16] J. Shotton, J. Winn, C. Rother, and A. Criminisi.

TextonBoost: Jointappearance, shape and contextmodeling for multi-class object recognition and

segmentation. InECCV, 2006.[17] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky.

Describing visual scenes using transformed objectsand parts. InIJCV, 2007.[18] A. Torralba, K.

P. Murphy, W. T. Freeman, and M. A. Rubin. Context-based vision system for place andobject

KIETW-ECE Page 60
Image Segmentation using Region-based Object Detector

recognition, 2003.[19] A. Torralba, K. Murphy, and W. Freeman. Sharing features: efficient

boosting procedures for multiclassobject detection. InCVPR, 2004.[20] A. Torralba, K. Murphy,

and W. Freeman. Contextual models for object detection using boosted randomfields. InNIPS,

2004.

[21] Z. Tu, X. Chen, A. L. Yuille, and S.-C. Zhu. Image parsing: Unifying segmentation,

detection, andrecognition. InICCV, 2003.[22] P. Viola and M. J. Jones. Robust real-time face

detection. IJCV, 2004.[23] C. Wojek and B. Schiele. A dynamic conditional random field model

for joint labeling of object and sceneclasses. InECCV, 2008.

KIETW-ECE Page 61

You might also like