You are on page 1of 50

DISCRIMINATIVE FEATURES BASED ON FAR-OFF DETECTING IMAGES USING

LIFTED WAVELET TRANSFORMER (LWT)

ABSTRACT:

Recently, various patch based approaches have emerged for high and very high resolution
(VHR) multi-spectral images classification and indexing. This comes as a consequence of the
most important particularity of multi-spectral data: objects are represented using several spectral
bands that equally influence the classification process. Single layer and deep convolutional
networks for remote sensing data analysis. Direct application to multi- and hyper-spectral
imagery of supervised (shallow or deep) convolutional networks is very challenging given the
high input data dimensionality and the relatively small amount of available labeled data
classification of aerial scenes, as well as land-use classification in very high resolution (VHR), or
land-cover classification from multi- and hyper-spectral images. The proposed algorithm clearly
outperforms standard Regions of Interest (ROI) and its lifted wavelet transformer (LWT), as well
as current state-of-the-art algorithms of aerial classification, while being extremely
computationally efficient at learning representations of data. Results show that single layer
convolutional network scan extract powerful discriminative features only when the receptive
field accounts for neighboring pixels, and are preferred when the classification requires high
resolution and detailed results. Comparable, facilitating the exploration of previous and newly
lunched satellite mission.

3
INTRODUCTION

Image compression is now essential for applications such as transmission and storage in
data bases. Image compression is a technique which is used to compress the data to reduce the
storage and transmission time. Image compression is the application of data compression on
digital images. The objective is to reduce redundancy of the image data in order to be able to
store or transmit data in an efficient form. The compression provides to reduce the cost of
storageand increasethe speed of transmission. Image compression is used to minimize the size in
bytes of a graphics file without degrading the quality of the image. There are two types of image
compression is present lossy and loss less[18]. The lossy type aims to reduce the bits required for
storing or transmitting an image without considering the image resolution much and the lossless
type of image compression focuses on preserving the quality of the compressed image so that it
is same as the original image.

The main objective of the super resolution images is to enhance the quality of the
multiple lower resolution images. Super Resolution image is constructed by using raw images.
An Image with improved resolution is always desirable for various applications like satellite,
medical etc. to enhance the qualitative features are the images. In this paper, Super Resolution
Image Reconstruction (SRIR) is proposed for improving the resolution of lower resolution
images. Proposed approach is described as follows. Initially, Some low resolution images of
same scene which are usually translated, rotated and blurred are used to form a super resolution
image. Then, the image registration operation translated, orients, scaled and rotated images in the
similar way to that of source image. Next, Lifting Wavelet Transform (LWT) with Daubechies4
coefficients are applied to color components of each image due to its less memory allocation
compared to other wavelet techniques. Further, Set Portioning in Hierarchical Trees (SPIHT)
technique is applied for image compression as it possess lossless compression, fast
encoding/decoding, and adaptive nature. The three low resolution images are fused by spatial
image fusion method. The noise is removed by dual tree Discrete Wavelet Transform (DWT) and

2
blurring is reduced by blind deconvolution. Finally, the samples are interpolated to original
samples to obtain a super resolution image. The structural similarity for each intermediate image
is compared to the source image to observe high structural similarity by objective analysis.
Gabor transform is also implemented for image enhancement and edge detection.

The exponential growth of the EO image data collections obtained from satellite and
aerial sensors, in the last decades a lot of effort has been made for scene understanding and
analysis.

Most of the image analysis approaches use feature extraction techniques and
classification algorithms to automatically group input data similarities. Also, being generally
inspired from the human visual system specialized in detecting specific image properties such as
texture, color and shape these methods usually require human-based data annotation, either for
the training or the validation process.

Even though part of the effort made in EO data classification and understanding is based
on multimedia image processing techniques there are attempts of using statistical text modeling
approaches, such as author-topic-model and author-genretopic- model. These methods are using
latent Dirichlet allocation to treat the topic mixture parameters as variables drawn from a
Dirichlet distribution. Also present new techniques based on libraries of pretrained part detectors
used for midlevel visual elements discovery in VHR remote sensing images. However, a lot of
effort has been made to develop better texture, color and shape feature extraction techniques for
both pixel and patch-based multispectral image analysis.

Widely used for image interpretation, segmentation, classification or change detection,


the texture component can be defined as an arrangement of pixels as well as the spatial
dependency between them in image. Even though there are a lot of implementations, most of the
texture analysis applications use techniques based on gray level co-occurrence matrix, wavelet
transforms, Gauss-Markov random fields and Gabor filtering.

3
By using only spectral information, color features are very easy to compute in
comparison with texture and shape features, and are used on a large scale in scene classification
and content based image retrieval applications. Some of the color features mostly used in remote
sensing image analysis is color histograms and color moments.

PROBLEM DESCRIPTION

Remote Sensing

Sensors capture the pictures of the earth’s surface in remote sensing satellites or multi –
spectral scanner which is mounted on an aircraft. These pictures are processed by transmitting it
to the Earth station. Techniques used to interpret the objects and regions are used in flood
control, city planning, resource mobilization, agricultural production monitoring, etc.

Image sharpening and restoration

The Distributional clustering has been used to cluster image into groups.

In proposed a new information-theoretic divisive algorithm for word clustering and


applied it to text classification

Cluster features:

 Special metric of distance

 Cluster hierarchy to choose the most relevant attributes.

 The classifiers to improve their original performance accuracy

4
Moving Object Tracking

Enables to measure motion parameters and acquire visual record of the moving object.
The different types of approach to track an object are:

 Motion based Tracking


 Recognition based Tracking

Measurement of Pattern

Algorithm involves the computation of SU values for TR relevance and F-Correlation,


which has linear complexity in terms of the number of instances in a given data set. This
algorithm has a linear time complexity in terms of the number of feature.

Image Recognition

A multispectral image consists of several bands of data. For visual display, each band of
the image may be displayed one band at a time as a grey scale image, or in combination of three
bands at a time as a color composite image. Interpretation of a multispectral color composite
image will require the knowledge of the spectral reflectance signature of the targets in the scene.

CHAPTER-2

5
LITERATURE SURVEY

Ship detection holds the key for a wide array of applications, such as naval defense,
situationassessment, traffic surveillance, maritime rescue, and fishery management. For decades,
major achievements in this area focus on synthetic aperture radar (SAR) images, which are less
influenced by time and weather. However, the resolutions of SAR images are low and the
targetslack texture and color features. Nonmetallic ships might not be visible and the capacity is
limited in ship wake detection [1].
With the rapid development of earth observation technology, the optical imagesfrom
Unmanned Airborne Vehicles (UAVs) and satellites have more detailed information and more
obvious geometric structure. Compared with the SAR technology, they are more intuitive and
easier tounderstand. Attributing to the advantages, we take ships in optical remote sensing
images as researchtargets in this paper. However, plenty of difficulties are still confronted [2].
High speed and suppression ability. Reduce the false alarms; a new and effective multi-
level discrimination method is designed based on the improved entropy and pixel distribution,
which is robust against the interferences introduced by islands, coastlines, clouds, and shadows.
A two-dimensional (2-D) edge adaptive lifting structure, which is similar to Daubechies
5/3 wavelet, is presented. The 2-D prediction filter predicts the value of the next polyphaser
component according to an edge orientation estimator of the image. Consequently, the prediction
domain is allowed to rotate 45 in regions with diagonal gradient [3].
The gradient estimator is computationally inexpensive with additional costs of only six
subtractions per lifting instruction, and no multiplications are required. The interpolation strategy
gives a good approximation of a possibly missing color sensor output, so it improves both the
mean-square-error and the subjective quality of the acquired color image in CCD imaging
systems. Ideally we would like to approximate an image with a small number of parameters; the
wavelet transform provides such an efficient representation. If input signal is shifted in time or
space, wavelet coefficients of the Decimated DWT will be changed. Separable 2DDWT
efficiently detects horizontal and vertical edges. But, if the edges are under an acute angle,
unwanted checkerboard artifacts appear [4].

6
A novel 2-D wavelet transform scheme of adaptive directional lifting (ADL) in image
coding. Instead of alternatelyapplying horizontal and vertical lifting, as in present practice, ADL
performs lifting-based prediction in local windows in the direction of high pixel correlation.
Hence, it adapts far better to the image orientation features in local windows. The ADL transform
is achieved by existing 1-D wavelets and is seamlessly integrated into the global wavelet
transform. The predicting and updating signals of ADL can be derived even at the fractional pixel
precision level to achieve high directional resolution, while still maintaining perfect
reconstruction [5].
A rate-distortion optimized directional segmentation scheme is also proposed to form and
code a hierarchical image partition adapting to local features. Experimental results show that the
proposed ADL-based image coding technique outperforms JPEG 2000 in both PSNR and visual
quality, with the improvement up to 2.0 dB on images with rich orientation features.The wavelet-
based JPEG 2000 international standard for still image compression not only obtains superior
compression performance over the DCT-based old JPEG standard, but also offers scalability
advantages in reconstruction quality and spatial resolution that are desirable features for many
consumer and network applications. High angular resolution in prediction is achieved by the use
of fractional pixels in prediction and update operations. The fractional pixels can be calculated
by any existing interpolation method [6].

A novel saliency detection model is introduced by utilizing low-level features obtained


from the wavelet transform domain. Firstly, wavelet transform is employed to create the multi-
scale feature maps which can represent different features from edge to texture. a computational
model for the saliency map from these features. The proposed model aims to modulate local
contrast at a location with its global saliency computed based on the likelihood of the features,
and the proposed model considers local center-surround differences and global contrast in the
final saliency map. Experimental evaluation depicts the promising results from the proposed
model by outperforming the relevant state of the art saliency detection models. Since the
surrounding environment includes an excessive amount of information, visual attention
mechanism enables a reduction of the redundant data which benefits perception during the
selective attention process [7].

7
The predicted and updated samples are always in integer pixel positions. Blocks with
different directions are continuously processed, which may cause boundary effects in the block
boundaries. Distortions in low–low sub images across decomposition scales [8].

Traditional approaches for detecting visually salient regions or targets in remote sensing images
are inaccurate and prohibitively computationally complex. In this letter, a fast, efficient region-
of-interest extraction method based on frequency domain analysis and salient region detection
(FDA-SRD) is proposed. First, the HSI transform is used to preprocess the remote sensing image
from RGB space to HSI space. Second, a frequency domain analysis strategy based on
quaternion Fourier transforms was employed to rapidly generate the saliency map. Finally, the
salient regions are described by an adaptive threshold segmentation algorithm based on Gaussian
Pyramids. Compared with existing models, the new algorithm is computationally more efficient
and provides more visually accurate detection results [9].

The object and backgrounds are separated, that means the foreground and backgrounds
are separated, so it makes easy to determine the features of the object. Iterative and joint
optimization scheme. The background is taken as noise in many cases except some cases. And
then the single level is not providing the object alone. We have to collect a large image database
containing tens of thousands of carefully labelled images. Supervised training is required and
the learned transform matrix is somewhat biased toward the training dataset, therefore it suffers
from limited adaptability [10].

8
METHODOLOGY

System Architecture

Algorithm 1 Mean Shift Algorithm: An image and the multiple levels m’s.

1: Smooth the image using the different parameters λ’s;

2: Obtain the multiple edge saliency maps

3: Merge the edge saliency maps and smooth it using Gaussian filter. Output: Global edge
saliency S edge.

Filtering Technical: Given an image

I 1: Calculate the local color saliency

2: Calculate the local textural saliency

3: Calculate the global color saliency

4: Calculate the global edge saliency using Algorithm 1;

5: Combination of the above four saliency, i.e., Sall = 4 i=1 w˜ i · Si;

6: Refine the final saliency through i.e., Sfinal = Sall · Fci. Output: Visual saliency final.

9
REMOTE SENSING:-

Sensors capture the pictures of the earth’s surface in remote sensing satellites or multi –
spectral scanner which is mounted on an aircraft. These pictures are processed by transmitting it
to the Earth station. Techniques used to interpret the objects and regions are used in flood
control, city planning, resource mobilization, agricultural production monitoring, etc.

IMAGE RECOGNITION:-

A multi-spectral image consists of several bands of data. For visual display, each band of
the image may be displayed one band at a time as a grey scale image, or in combination of three
bands at a time as a color composite image. Interpretation of a multi-spectral color composite
image will require the knowledge of the spectral reflectance signature of the targets in the scene.

Mask Post processing:


Due to the ND-LWT’s high sensitivity to rich texture and bright spots, some scattered and
incomplete regions are highlighted as ROIs. However, not only do these region fragments make
nearly no improvement to the quality of the reconstructed image, they can also unfavorably
influence the compression ratio. Hence, from the benefits ofcompression, it is necessary to get
rid of the ROI fragments and keep the main ROIs through mask post processing
.
Multibit plane Alternating Shift:
After coefficient quantization, to conduct ROI coding with MAS. MAS only consider
several highest bit positions presenting the great values as the Minimum Spanning Tree (MST).
The following several biplanes are general significant Bit planes (GSB), and the least significant
bit planes (LSB) area the bottom of the binary number. These three bit plane types act differently
in ROI coding. MST contains the most significant information of ROI. GSB adjusts the
difference on reconstruction effect between ROI and background, making the significant
information of background be soon recovered after MST is accomplished, meanwhile assuring
the recovery of ROI’s general significant information.

10
MEASUREMENT OF PATTERN:-

Algorithm involves the computation of SU values for TR relevance and F-Correlation,


which has linear complexity in terms of the number of instances in a given data set. This
algorithm has a linear time complexity in terms of the number of feature

11
CHAPTER-2

LITERATURE SURVEY

The exponential growth of Earth observation (EO) image data collections obtained from satellite
and aerial sensors, in the last decades, a lot of effort has been made for scene understanding and
analysis. Most of the image analysis approaches use feature extraction techniques and
classification algorithms to automatically group input data similarities. Also, being generally
inspired from the human visual system specialized in detecting specific image properties such as
texture, color, and shape, these methods usually require human-based data annotation, either for
the training or the validation process . Even though part of the effort made in EO data
classification and understanding is based on multimedia image processing techniques, there are
attempts of using statistical text modeling approaches, such as author-topic-model and author-
genre topic-model . These methods are using allocation to treat the topic mixture parameters as
variables drawn from a distribution. Also, present new techniques based on libraries of pre
trained part detectors used for midlevel visual elements discovery in very high resolution (VHR)
remote sensing images. However, a lot of effort has been made to develop better texture, color,
and shape feature extraction techniques for both pixel and patch-based multispectral image
analyses. Widely used for image interpretation, segmentation, classification, or change detection,
the texture component can be defined as an arrangement of pixels as well as the spatial
dependence between them in the image . Even though there are a lot of implementations, most of
the texture analysis applications use techniques based on gray-level co-occurrence matrix ,
wavelet transforms , Gauss–Markov random fields , and Gabor filtering . By using only spectral
information, color features are very easy to compute in comparison with texture and shape
features and are used on a large scale in scene classification and content-based image retrieval
applications. Some of the color features mostly use remote sensing image analysis are color
histograms and color moments . Local feature descriptors are another very important category of
features. Like most of the available feature extraction methods for EO data, these local
discriminative features have first been developed for multimedia image processing. The most
popular and widely used techniques are referred to scale in variant feature transform , speeded up
robust feature detector rotation-invariant feature transform , census transform histogram , and
local binary pattern . Currently evolving texture analysis and local feature extraction techniques
have led the way to the bag-of-words (BoW) method. Even though BoW was initially used for
video search, a lot of derivate methods that emerged from it could solve problems like image
classification, image retrieval, and object recognition. In the remote sensing community, this
technique has been recently introduced for image annotation, object classification ,target
detection, and land use classification, and it has already proven its discrimination power in image
classification . In the BoW framework, there are several ways to generate the visual codebook.
K-means is the most common clustering procedure; however, in , there are some attempts in
using random dictionaries . It is known that, in the case of multispectral EO image processing,
objects are represent educing several spectral bands that equally influence the classification
process. In this letter, we present an evaluation of state-of-the-art texture and spectral feature
extraction methods such as Gabor , spectral histogram and methods based on pure spectral
information in a patch-based approach to demonstrate their usability in the case of multispectral
EO image data analysis and understanding. Relying on these classic approaches, we introduce
new feature extraction techniques that will depict both spectral information and structural
information. The proposed methods can provide similar accuracy and enhanced computational
speed

FEATURE EXTRACTION

In this section, briefly introduce how the image descriptors are computed. Our goal is to cover a
wide range of feature extraction methods sensible to both structural information and spectral
information.

A. Gabor Features

Based on a wavelet transform, a multiscale Gabor filter is among them Most used texture
descriptors, being described in the MPEG-7 standard as a homogeneous texture descriptor . The
Gabor representation has been proven to be optimal in the sense of minimizing the joint 2-D
uncertainty in space and frequency , being well suited for texture detection and classification.
DifferentparametersetupsareusedforGaborfiltercomputation,butforbestresults,mostoftheauthorsus
e2–6frequencies and 2–8 orientations, as presented in and In our approach, we compute the
Gabor filter bank for θ =6 orientations and ϕ =4scales. This means that we will have to filter each
spectral band for every parameter combination. For each computed patch, we extract the mean
and standard deviation and keep them as Gabor features (G). The size of a Gabor feature vector
computed for a multispectral image patch with nb number of bands will have the size θ×ϕ×nb×2.

B. Spectral Histogram Features

The spectral histogram is one of the most frequently used and basic descriptors that characterize
the spectrum distribution in an image. Being inspired from the color histogram, we extend this
terminology to the high spectral resolution of a multispectral image. A generic spectral histogram
(H) descriptor should be able to capture the spectral value distribution for image search and
retrieval applications with reasonable accuracy. For this simple but efficient image descriptor, we
compute for each band of a multispectral image patch the spectral histogram with hb = 64number
of histogram bins for each of the nb number of bands of the image, being highly motivated in
dimensionality reduction of the feature vector. The computed nb histogram vectors are then
merged together into the spectral histogram feature vector of size hb×nb.

C. Concatenated Gabor-Histogram Features

In order to improve the performances of the existing feature extraction methods, we propose
joining Gabor features computed on the pure texture band with the spectral histogram
featurescomputedforall ofthespectralbands.We generatethe texture band by using an average of
the entire spectral bands available in the multispectral image. Fig. 3, at the left side, shows the
band correlations between two of the visible bands, while the right side illustrates the correlation
between visible and infrared bands of a WorldView-2 multi spectral image.As we can easily
observe, the visible bands are highly correlated with each other and strongly uncorrelated with
the infrared bands. This may suggest that using the infrared information to compute the image
features will help us develop more efficient image descriptors that can represent the analyzed
regions more accurately. This new feature extraction method, based on both Gabor and spectral
histogram features (GH), will compute for a multispectral image patch a feature vector with the
size of 2×θ×ϕ + hb×nb elements, where θ represents the number of orientations and ϕ represents
the number of frequencies of the Gabor filter.

D. Spectral Indices

Using all of the multispectral attributes, the spectral indices are a special category of image
features that can be applied on multispectral images only. Taking into account the radiance
values for each band and all of the possible(b1−b2)/(b1+b2) band ratios, where b1 and b2 refer to
different band combinations, the number of computed spectral attributes is nb(nb + 1)/2,
considering nb is the number of bands in a multispectral image, as described in . This leads to a
very fast and easy to compute feature descript or for multi spectral images. For our patch-
basedfeatureextractionemployingthespectralindicescomputation modified the descriptor to use
the same number of features as the origin alone, presented. In our patch-based approach, we
compute the feature vector with the size nb(nb + 1)/2 to be the mean of all spectral indices values
from within the analyzed patch.

E. BoW Framework

In the BoW model, a vector quantization of the spectral descriptor image against a visual code
book is performed. Depending on the features used for codebook generation, different
classification results may be obtained also used the radiance values for each pixel, as described in
with a dictionary size of 100 words. The codebook was generated using k-means clustering on
10% of the computed features. Naturally, the size of the feature vector for a patch based BoW is
equal to the number of distinct words generated.
TITLE:A 2-D Orientation-Adaptive Prediction Filter in Lifting Structures for Image Coding.

AUTHORS: Ömer N. Gere, A. Enis Çetin.

DESCRIPTION:

A two-dimensional (2-D) edge adaptive lifting structure, which is similar to Daubechies


5/3 wavelet, is presented. The 2-D prediction filter predicts the value of the next poly phase
component according to an edge orientation estimator of the image. Consequently, the prediction
domain is allowed to rotate 45 in regions with diagonal gradient. The gradient estimator is
computationally inexpensive with additional costs of only six subtractions per lifting instruction,
and no multiplications are required.
ADVANTAGES:

 The interpolation strategy gives a good approximation of a possibly missing color sensor
output, so it improves both the mean-square-error and the subjective quality of the
acquired color image in CCD imaging systems.
 Ideally we would like to approximate an image with a small number of parameters; the
wavelet transform provides such an efficient representation.

DISADVANTAGES:

 If input signal is shifted in time or space, wavelet coefficients of the Decimated DWT
will be changed.
 Separable 2DDWT efficiently detects horizontal and vertical edges. But, if the edges are
under an acute angle, unwanted checkerboard artifacts appear.
TITLE: Adaptive Directional Lifting-Based Wavelet Transform for Image Coding.

AUTHORS: Wenpeng Ding, Feng Wu, Xiaolin Wu, Shipeng Li, and Houqiang Li.

DESCRIPTION:

A novel 2-D wavelet transform scheme of adaptive directional lifting (ADL) in image
coding. Instead of alternately applying horizontal and vertical lifting, as in present practice, ADL
performs lifting-based prediction in local windows in the direction of high pixel correlation.
Hence, it adapts far better to the image orientation features in local windows. The ADL transform
is achieved by existing 1-D wavelets and is seamlessly integrated into the global wavelet
transform. The predicting and updating signals of ADL can be derived even at the fractional pixel
precision level to achieve high directional resolution, while still maintaining perfect
reconstruction. a rate-distortion optimized directional segmentation scheme is also proposed to
form and code a hierarchical image partition adapting to local features. Experimental results
show that the proposed ADL-based image coding technique outperforms JPEG 2000 in both
PSNR and visual quality, with the improvement up to 2.0 dB on images with rich orientation
features.
ADVANTAGES:

 The wavelet-based JPEG 2000 international standard for still image compression not only
obtains superior compression performance over the DCT-based old JPEG standard, but
also offers scalability advantages in reconstruction quality and spatial resolution that are
desirable features for many consumer and network applications.
 High angular resolution in prediction is achieved by the use of fractional pixels in
prediction and update operations. The fractional pixels can be calculated by any existing
interpolation method.

DISADVANTAGES:

 The problem of subpixel interpolation to facilitate spatial prediction in arbitrary angle is


also discussed.
 The interpolation used in the ADL wavelet transform is always performed in either the
horizontal or vertical direction.
TITLE:A Saliency Detection Model Using Low-Level Features Based on Wavelet Transform.
AUTHORS:Nevrez İmamoğlu, Weisi Lin, and Yuming Fang.

DESCRIPTION:
A novel saliency detection model is introduced by utilizing low-level features obtained
from the wavelet transform domain. Firstly, wavelet transform is employed to create the multi-
scale feature maps which can represent different features from edge to texture. a computational
model for the saliency map from these features. The proposed model aims to modulate local
contrast at a location with its global saliency computed based on the likelihood of the features,
and the proposed model considers local center-surround differences and global contrast in the
final saliency map. Experimental evaluation depicts the promising results from the proposed
model by outperforming the relevant state of the art saliency detection models.

ADVANTAGES:
 Since the surrounding environment includes an excessive amount of information, visual
attention mechanism enables a reduction of the redundant data which benefits perception
during the selective attention process.
 The predicted and updated samples are always in integer pixel positions.

DISADVANTAGES:
 Image blocks with different directions are continuously processed, which may cause
boundary effects in the block boundaries.
 Distortions in low–low sub images across decomposition scales.
TITLE: Region-of-Interest Extraction Based on Frequency Domain Analysis and Salient Region
Detection for Remote Sensing Image.
AUTHORS: Libao Zhang and Kaina Yang
DESCRIPTION:
Traditional approaches for detecting visually salient regions or targets in remote sensing
images are inaccurate and prohibitively computationally complex. In this letter, a fast, efficient
region-of-interest extraction method based on frequency domain analysis and salient region
detection (FDA-SRD) is proposed. First, the HSI transform is used to preprocess the remote
sensing image from RGB space to HSI space. Second, a frequency domain analysis strategy
based on quaternion Fourier transforms was employed to rapidly generate the saliency map.
Finally, the salient regions are described by an adaptive threshold segmentation algorithm based
on Gaussian Pyramids. Compared with existing models, the new algorithm is computationally
more efficient and provides more visually accurate detection results.

ADVANTAGES:
 The object and backgrounds are separated, that means the foreground and backgrounds
are separated, so it makes easy to determine the features of the object.
 Iterative and joint optimization scheme.

DISADVANTAGES:
 The background is taken as noise in many cases except some cases. And then the single
level is not providing the object alone. We have to collect a large image database
containing tens of thousands of carefully labelled images.
 The supervised training is required and the learned transform matrix is somewhat biased
toward the training dataset, therefore it suffers from limited adaptability.

TITLE:Global and Local Saliency Analysis for the Extraction of Residential Areas in High-
Spatial-Resolution Remote Sensing Image.
AUTHORS: Libao Zhang, Aoxue Li, Zhongjun Zhang, and Kaina Yang.
DESCRIPTION:
Quality residential areas extracted from a remote sensing image must meet three
requirements: well-defined boundaries uniformly highlighted residential area, and no background
redundancy in residential areas. Driven by these requirements, a global and local saliency
analysis model (GLSA) for the extraction of residential areas in high-spatial-resolution remote
sensing images. a global saliency map based on quaternion Fourier transform (QFT) and a global
saliency map based on adaptive directional enhancement lifting wavelet transform (ADE-LWT)
are generated along with a local saliency map, all of which are fused into a main saliency map
based on complementarities. In order to analyze the correlation among spectrums in the remote
sensing image, the phase spectrum information of QFT is used on the multispectral images for
producing a global saliency map. To acquire the texture and edge features of different scales and
orientations, the coefficients acquired by ADE-LWT are used to construct another global saliency
map. To discard redundant backgrounds, the amplitude spectrum of the Fourier transform and the
spatial relations among patches are introduced into the panchromatic image to generate the local
saliency map. Experimental results indicate that the GLSA model can better
Define the boundaries of residential areas and achieve complete residential areas than current
methods.
ADVANTAGES:
 To the best of our knowledge, it is the first attempt to combine global salM3LBP features
and local CLM (eSIFT) to achieve a fused representation for image scene classification.
Two different types of features together can effectively mitigate respective shortcomings
of global features and local ones.
 Framework is unified in a simple and effective way, which benefits image scene
classification.

DISADVANTAGES:
 Deep CNNs have an intrinsic limitation due to the complicated pretraining process to
adjust parameters.
 If input signal is shifted in time or space, wavelet coefficients of the Decimated DWT
will be changed.
CHAPTER-3

Digital Image Processing

Digital image processing is the use of computer algorithms to perform image


processing on digital images. As a subcategory or field of digital signal processing, digital image
processing has many advantages over analog image processing. It allows a much wider range of
algorithms to be applied to the input data and can avoid problems such as the build-up of noise
and signal distortion during processing. Since images are defined over two dimensions (perhaps
more) digital image processing may be modeled in the form of multidimensional systems. Digital
image processing allows the use of much more complex algorithms, and hence, can offer both
more sophisticated performance at simple tasks, and the implementation of methods which
would be impossible by analog means.

In particular, digital image processing is the only practical technology for:

 Classification
 Feature extraction
 Pattern recognition
 Projection
 Multi-scale signal analysis
Some techniques which are used in digital image processing include:

 Pixelation
 Linear filtering
 Principal components analysis
 Independent component analysis
 Hidden Markov models
 Anisotropic diffusion
 Partial differential equations
 Self-organizing maps
 Neural networks, Wavelets

Image Segmentation

In computer vision, image segmentation is the process of partitioning a digital


image into multiple segments (sets of pixels, also known as superpixels). The goal of
segmentation is to simplify and/or change the representation of an image into something that is
more meaningful and easier to analyze. Image segmentation is typically used to locate objects
and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process
of assigning a label to every pixel in an image such that pixels with the same label share certain
characteristics.

The result of image segmentation is a set of segments that collectively cover the entire
image, or a set of contours extracted from the image (see edge detection). Each of the pixels in a
region are similar with respect to some characteristic or computed property, such as color,
intensity, or texture. Adjacent regions are significantly different with respect to the same
characteristic(s).When applied to a stack of images, typical in medical imaging, the resulting
contours after image segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like Marching cubes.

Thresholding

The simplest method of image segmentation is called the thresholding method. This
method is based on a clip-level (or a threshold value) to turn a gray-scale image into a binary
image. There is also a balanced histogram thresholding.

The key of this method is to select the threshold value (or values when multiple-levels are
selected). Several popular methods are used in industry including the maximum entropy
method, Otsu's method (maximum variance), and k-means clustering. Recently, methods have
been developed for thresholding computed tomography (CT) images. The key idea is that,
unlike Otsu's method, the thresholds are derived from the radiographs instead of the
(reconstructed) image

Clustering

The K-means algorithm is an iterative technique that is used to partition an


image into K clusters. The basic algorithm is

1. Pick K cluster centers, either randomly or based on some heuristic

2. Assign each pixel in the image to the cluster that minimizes the distance between the
pixel and the cluster center

3. Re-compute the cluster centers by averaging all of the pixels in the cluster

4. Repeat steps 2 and 3 until convergence is attained (i.e. no pixels change clusters)

In this case, distance is the squared or absolute difference between a pixel and a cluster center.
The difference is typically based on pixel color, intensity, texture, and location, or a weighted
combination of these factors. K can be selected manually,randomly, or by a heuristic. This
algorithm is guaranteed to converge, but it may not return the optimal solution. The quality of the
solution depends on the initial set of clusters and the value of K.

Comparision based methods


Compression based methods postulate that the optimal segmentation is the one that
minimizes, over all possible segmentations, the coding length of the data.The connection
between these two concepts is that segmentation tries to find patterns in an image and any
regularity in the image can be used to compress it. The method describes each segment by its
texture and boundary shape. Each of these components is modeled by a probability distribution
function and its coding length is computed as follows:

1. The boundary encoding leverages the fact that regions in natural images tend to have a
smooth contour. This prior is used by Huffman coding to encode the difference chain
code of the contours in an image. Thus, the smoother a boundary is, the shorter coding
length it attains.

2. Texture is encoded by lossy compression in a way similar to minimum description


length (MDL) principle, but here the length of the data given the model is approximated
by the number of samples times the entropy of the model. The texture in each region is
modeled by a multivariate normal distribution whose entropy has closed form
expression. An interesting property of this model is that the estimated entropy bounds the
true entropy of the data from above. This is because among all distributions with a given
mean and covariance, normal distribution has the largest entropy. Thus, the true coding
length cannot be more than what the algorithm tries to minimize.

For any given segmentation of an image, this scheme yields the number of bits required to
encode that image based on the given segmentation. Thus, among all possible segmentations of
an image, the goal is to find the segmentation which produces the shortest coding length. This
can be achieved by a simple agglomerative clustering method. The distortion in the lossy
compression determines the coarseness of the segmentation and its optimal value may differ for
each image. This parameter can be estimated heuristically from the contrast of textures in an
image. For example, when the textures in an image are similar, such as in camouflage images,
stronger sensitivity and thus lower quantization is required.

Histogram Based Methods

Histogram-based methods are very efficient when compared to other image segmentation
methods because they typically require only one pass through the pixels. In this technique, a
histogram is computed from all of the pixels in the image, and the peaks and valleys in the
histogram are used to locate the clusters in the image. Color or intensity can be used as the
measure.

A refinement of this technique is to recursively apply the histogram-seeking method to


clusters in the image in order to divide them into smaller clusters. This is repeated with smaller
and smaller clusters until no more clusters are formed.

One disadvantage of the histogram-seeking method is that it may be difficult to identify


significant peaks and valleys in the image.

Histogram-based approaches can also be quickly adapted to occur over multiple frames,
while maintaining their single pass efficiency. The histogram can be done in multiple fashions
when multiple frames are considered. The same approach that is taken with one frame can be
applied to multiple, and after the results are merged, peaks and valleys that were previously
difficult to identify are more likely to be distinguishable. The histogram can also be applied on a
per pixel basis where the information result are used to determine the most frequent color for the
pixel location. This approach segments based on active objects and a static environment,
resulting in a different type of segmentation useful in Video tracking.

Edge Detection:

Edge detection is a well-developed field on its own within image processing. Region
boundaries and edges are closely related, since there is often a sharp adjustment in intensity at
the region boundaries. Edge detection techniques have therefore been used as the base of another
segmentation technique.

The edges identified by edge detection are often disconnected. To segment an object from an
image however, one needs closed region boundaries. The desired edges are the boundaries
between such objects or spatial-taxons.

Spatial-taxons are information granules. consisting of a crisp pixel region, stationed at


abstraction levels within a hierarchical nested scene architecture. They are similar to the Gestalt
psychological designation of figure-ground, but are extended to include foreground, object
groups, objects and salient object parts. Edge detection methods can be applied to the spatial-
taxon region, in the same manner they would be applied to a silhouette. This method is
particularly useful when the disconnected edge is part of an illusory contour.

Segmentation methods can also be applied to edges obtained from edge detectors. Lindeberg
and Li developed an integrated method that segments edges into straight and curved edge
segments for parts-based object recognition, based on a minimum description length (MDL)
criterion that was optimized by a split-and-merge-like method with candidate breakpoints
obtained from complementary junction cues to obtain more likely points at which to consider
partitions into different segments.

Region Growing Methods

Region-growing methods mainly rely on the assumption that the neighboring pixels
within one region have similar values. The common procedure is to compare one pixel with its
neighbors. If a similarity criterion is satisfied, the pixel can be set belong to the cluster as one or
more of its neighbors. The selection of the similarity criterion is significant and the results are
influenced by noise in all instances.

The method of Statistical Region Merging (SRM) starts by building the graph of pixels using the
4-connectedness with edges weighted by the absolute value of the intensity difference. Initially
each pixel forms a single pixel region. SRM then sorts those edges in a priority queue and decide
whether to merge or not the current regions belonging to the edge pixels using a statistical
predicate.

One region-growing method was the seeded region growing method. This method takes a
set of seeds as input along with the image. The seeds mark each of the objects to be segmented.
The regions are iteratively grown by comparing all unallocated neighboring pixels to the regions.
The difference between a pixel's intensity value and the region's mean, , is used as a measure of
similarity. The pixel with the smallest difference measured this way is allocated to the respective
region. This process continues until all pixels are allocated to a region. Because Seeded region
growing requires seeds as additional input, the segmentation results are dependent on the choice
of seeds, and noise in the image can cause the seeds to be poorly placed.
Another region-growing method was the unseeded region growing method. It is a

modified algorithm that doesn't require explicit seeds. It starts off with a single region – the
pixel chosen here does not significantly influence final segmentation. At each iteration it
considers the neighboring pixels in the same way as seeded region growing. It differs from
seeded region growing in that if the minimum is less than a predefined threshold then it is

added to the respective region . If not, then the pixel is considered significantly different from

all current regions and a new region is created with this pixel.

One variant of this technique, proposed by Haralick and Shapiro (1985), is based on
pixel intensities. The mean and scatter of the region and the intensity of the candidate pixel is
used to compute a test statistic. If the test statistic is sufficiently small, the pixel is added to the
region, and the region’s mean and scatter are recomputed. Otherwise, the pixel is rejected, and is
used to form a new region.

A special region-growing method is called -connected segmentation (see also lambda-


connectedness). It is based on pixel intensities and neighborhood-linking paths. A degree of
connectivity (connectedness) will be calculated based on a path that is formed by pixels. For a
certain value of , two pixels are called -connected if there is a path linking those two pixels
and the connectedness of this path is at least . -connectedness is an equivalence relation.

Split-and-merge segmentation is based on a quadtree partition of an image. It is sometimes called


quadtree segmentation.

This method starts at the root of the tree that represents the whole image. If it is found
non-uniform (not homogeneous), then it is split into four son-squares (the splitting process), and
so on so forth. Conversely, if four son-squares are homogeneous, they can be merged as several
connected components (the merging process). The node in the tree is a segmented node. This
process continues recursively until no further splits or merges are possible. When a special data
structure is involved in the implementation of the algorithm of the method, its time complexity

can reach , an optimal algorithm of the method.

Graph Partitioning Methods


Graph partitioning methods can effectively be used for image segmentation. In these
methods, the image is modeled as a weighted, undirected graph. Usually a pixel or a group of
pixels are associated with nodes and edge weights define the (dis)similarity between the
neighborhood pixels. The graph (image) is then partitioned according to a criterion designed to
model "good" clusters. Each partition of the nodes (pixels) output from these algorithms are
considered an object segment in the image. Some popular algorithms of this category are
normalized cuts, random walker, minimum cut, isoperimetric partitioning, minimum spanning
tree-based segmentation, and segmentation-based object categorization.

Superpixel Method

Many existing algorithms in computer vision use the pixel-grid as the underlying
representation. For example, stochastic models of images, such as Markov random fields, are
often defined on this regular grid. Or, face detection is typically done by matching stored
templates to every fixed-size (say, 50x50) window in the image.

The pixel-grid, however, is not a natural representation of visual scenes. It is rather an


"artifact" of a digital imaging process. It would be more natural, and presumably more efficient,
to work with perceptually meaningful entities obtained from a low-level grouping process. For
example, we can apply the Normalized Cuts algorithm to partition an image into, say, 500
segments (what we call superpixels).

Such a superpixel map has many desired properties:

 It is computationally efficient: it reduces the complexity of images from hundreds of


thousands of pixels to only a few hundred superpixels.

 It is also representationally efficient: pairwise constraints between units, while only


for adjacent pixels on the pixel-grid, can now model much longer-range interactions
between superpixels.

 The superpixels are percetually meaningful: each superpixel is a perceptually


consistent unit, i.e. all pixels in a superpixel are most likely uniform in, say, color and
texture.
 It is near-complete: because superpixels are results of an oversegmentation, most
structures in the image are conserved. There is very little loss in moving from the
pixel-grid to the superpixel map.

It is actually not novel to use superpixels or atomic regions to speed up later-stage visual
processing; the idea has been around the community for a while. What we have done is: (1)
to empirically validate the completeness of superpixel maps; and (2) to apply it to solve
challenging vision problems such as finding people in static images.

Superpixels from the Normalized Cuts

The Normalized Cuts is a classical region segmentation algorithm developed at Berkeley,


which uses spectral clustering to exploit pairwise brightness, color and texture affinities between
pixels. We apply the Normalized Cuts to over segment images to obtain super pixels. In our
experiments, to enforce locality we use only local connections in the pairwise affinity matrix.

FEATURE EXTRACTION IMAGE SEGMENTATION

Image Search

Image Search Images RGB Gray Images


Image Search Results Validate

Application
Image Observation

Search
CHAPTER-4

IMPLEMENTATION

MODULES
 HSI transformation

 Saliency Map

 Threshold Segmentation

 Binarization

 Masking

HSI Transformation:

The HSI color space is very important and attractive color model for image processing
applications because it represents color s similarly how the human eye senses colors. The HSI
color model represents every color with three components: hue ( H ), saturation ( S ), intensity (
I ). To formula that converts from RGB to HSI or back is more complicated than with other
color models.

HSI common in computer vision applications, attempts to balance the advantages and
disadvantages of the other two systems. While typically consistent, these definitions are not
standardized, and the abbreviations are colloquially interchangeable for any of these three or
several other related cylindrical models. Note also that while "hue" in HSL and HSV refers to the
same attribute, their definitions of "saturation" differ dramatically. (For technical definitions of
these terms,

Both of these representations are used widely in computer graphics, but both are also criticized
for not adequately separating color-making attributes, and for their lack of perceptual uniformity.
This means that the color displayed on one monitor for a given HSV value is unlikely to exactly
match the color seen on another monitor unless the two are precisely adjusted to absolute color
spaces.

Other, more computationally intensive models, such as CIELAB or CIECAM02 are said to better
achieve the goal of accurate and uniform color display, but their adoption has been slow. HSL
and HSV were widely adopted as a standard alternative to RGB in the early days of color
computers due to their low processing time requirements, and their similarity to traditional
artist's color theory. Even in the case of digital artists, who generally come to recognize the flaws
of HSL/HSV systems fairly quickly, it is simpler to learn to work around the flaws of a familiar
system of color representation than to relearn their entire way of thinking about color by
adapting to the less intuitive RGB system of color mixing. Thus, in spite of their flaws, HSL and
HSV have proven difficult to replace.

Hue and chroma

In each of our models, we calculate both hue and what this article will call chroma, after Joblove
and Greenberg, in the same way – that is, the hue of a color has the same numerical values in all
of these models, as does its chroma. If we take our tilted RGB cube, and project it onto the
"chromaticity plane" perpendicular to the neutral axis, our projection takes the shape of a
hexagon, with red, yellow, green, cyan, blue, and magenta at its corners . Hue is roughly the
angle of the vector to a point in the projection, with red at 0°, while chroma is roughly the
distance of the point from the origin.

More precisely, both hue and chroma in this model are defined with respect to the hexagonal
shape of the projection. The chroma is the proportion of the distance from the origin to the edge
of the hexagon. In the lower part of the diagram to the right, this is the ratio of lengths OP/OP′,
or alternately the ratio of the radii of the two hexagons. This ratio is the difference between the
largest and smallest values among R, G, or B in a color. To make our definitions easier to write,
we’ll define these maximum and minimum component values as M and m, respectively.

To understand why chroma can be written as M − m, notice that any neutral color, with R = G =
B, projects onto the origin and so has 0 chroma. Thus if we add or subtract the same amount
from all three of R, G, and B, we move vertically within our tilted cube, and do not change the
projection. Therefore, the two colors (R, G, B) and (R − m, G − m, B − m) project on the same
point, and have the same chroma. The chroma of a color with one of its components equal to
zero (m = 0) is simply the maximum of the other two components. This chroma is M in the
particular case of a color with a zero component, and M − m in general.

The hue is the proportion of the distance around the edge of the hexagon which passes through
the projected point, originally measured on the range [0, 1) but now typically measured in
degrees [0°, 360°). For points which project onto the origin in the chromaticity plane (i.e., grays),
hue is undefined. Mathematically, this definition of hue is written piecewise:

Sometimes, neutral colors (i.e. with C = 0) are assigned a hue of 0° for convenience of
representation.

These definitions amount to a geometric warping of hexagons into circles: each side of the
hexagon is mapped linearly onto a 60° arc of the circle. After such a transformation, hue is
precisely the angle around the origin and chroma the distance from the origin: the angle and
magnitude of the vector pointing to a color.

Sometimes for image analysis applications, this hexagon-to-circle transformation is skipped, and
hue and chroma (we’ll denote these H2 and C2) are defined by the usual cartesian-to-polar
coordinate transformations (fig. 11). The easiest way to derive those is via a pair of cartesian
chromaticity coordinates which we’ll call α and β:
Notice that these two definitions of hue (H and H2) nearly coincide, with a maximum difference
between them for any color of about 1.12° – which occurs at twelve particular hues, for instance
H = 13.38°, H2 = 12.26° – and with H = H2 for every multiple of 30°. The two definitions of
chroma (C and C2) differ more substantially: they are equal at the corners of our hexagon, but at
points halfway between two corners, such as H = H2 = 30°, we have C = 1, but C2 = √¾ ≈ 0.866,
a difference of about 13.4%.

Saturation

If we encode colors in a hue/lightness/chroma or hue/value/chroma model (using the definitions


from the previous two sections), not all combinations of lightness (or value) and chroma are
meaningful: that is, half of the colors we can describe using H ∈ [0°, 360°), C ∈ [0, 1], and V ∈
[0, 1] fall outside the RGB gamut (the gray parts of the slices in figure 14). The creators of these
models considered this a problem for some uses. For example, in a color selection interface with
two of the dimensions in a rectangle and the third on a slider, half of that rectangle is made of
unused space. Now imagine we have a slider for lightness: the user’s intent when adjusting this
slider is potentially ambiguous: how should the software deal with out-of-gamut colors? Or
conversely, If the user has selected as colorful as possible a dark purple and then shifts the
lightness slider upward, what should be done: would the user prefer to see a lighter purple still as
colorful as possible for the given hue and lightness or a lighter purple of exactly the same
chroma as the original color

Saliency map
The saliency map was designed as input to the control mechanism for covert selective
attention. That contribute to attentive selection of a stimulus (color, orientation, movement etc)
are combined into one single topographically oriented map. The Saliency map which integrates
the normalized information from the individual feature maps into one global measure of
conspicuity. The Saliency Map is a topographically arranged map that represents visual
saliency of a corresponding visual scene.

Input Preprocess
ARCHITECTURE DIAGRAM:

HSI

Saliency

Threshold
Segmentation

Binarization ROI Result

Mask
Input Image

FLOW DIAGRAM:

Preprocess

ROI Extraction

Enroll/Verify
Original
image

Masking

Result
System Architecture

Feature Extraction Segmentation Images

Remote Feature Extraction Application Time


Sensing Data Complexity

Image Computer Classification Labeled Image


Vision

Hyper Remote
Spectral Sensing Image Segmented
Image Features Segmentation Image

Multi Spectral
Image Dimensionality Adaptive
reduction Denoising
Denoised
Estimates
Sparse
Representation
Features
REMOTE SENSING:-

Sensors capture the pictures of the earth’s surface in remote sensing satellites or multi –
spectral scanner which is mounted on an aircraft. These pictures are processed by transmitting it
to the Earth station. Techniques used to interpret the objects and regions are used in flood
control, city planning, resource mobilization, agricultural production monitoring, etc.

IMAGE RECOGNITION:-

A multi-spectral image consists of several bands of data. For visual display, each band of
the image may be displayed one band at a time as a grey scale image, or in combination of three
bands at a time as a color composite image. Interpretation of a multi-spectral color composite
image will require the knowledge of the spectral reflectance signature of the targets in the scene.

Mask Post processing:


Due to the ND-LWT’s high sensitivity to rich texture and bright spots, some scattered and
incomplete regions are highlighted as ROIs. However, not only do these region fragments make
nearly no improvement to the quality of the reconstructed image, they can also unfavorably
influence the compression ratio. Hence, from the benefits ofcompression, it is necessary to get
rid of the ROI fragments and keep the main ROIs through mask post processing
.
Multibit plane Alternating Shift:
After coefficient quantization, to conduct ROI coding with MAS. MAS only consider
several highest bit positionspresenting the great values as the Minimum Spanning Tree (MST).
The following several bitplanes are general significantBitplanes (GSB), and the least significant
bitplanes (LSB) areat the bottom of the binary number. These three bitplanetypes act differently
in ROI coding. MST contains the mostsignificant information of ROI. GSB adjusts the difference
onreconstruction effect between ROI and background, makingthe significant information of
background be soon recoveredafter MST is accomplished, meanwhile assuring the recoveryof
ROI’s general significant information.
MEASUREMENT OF PATTERN:-

Algorithm involves the computation of SU values for TR relevance and F-Correlation,


which has linear complexity in terms of the number of instances in a given data set. This
algorithm has a linear time complexity in terms of the number of feature
Automatic ROI Extraction

1) Saliency Map Generation: The saliency map basically under the framework proposed in our
previous work with one distinct difference in prediction direction storage. In our unified
directional wavelet-based pipeline, the prediction direction θN in ND-LWT equals θT ±π/2, with
θT as the prediction direction in TD-LWT obtained by a quad tree strategy.

2) Mask Postprocessing: Due to the ND-LWT’s high sensitivity to rich texture and bright spots,
some scattered and incomplete regions are highlighted as ROIs. However, not only do these
region fragments make nearly no improvement to the quality of the reconstructed image, they
can also unfavorably influence the compression ratio. Hence, from the benefits of compression, it
is necessary to get rid of the ROI fragments and keep the main ROIs through mask
postprocessing. In total, Sort all the extracted ROI regions in a descending order by their sizes:
S(R1) ≥ S(R2) ≥ S(R3) ≥ . . . ≥ S(RN ).
2) Initiate main ROI set with M = {R1}.
3) In this size descending order, keep adding Ri to M
If Ri × 2 ≥ Ri−1, i = 2, 3, . . . , N. Otherwise, stop iterating and obtain M = {R1, R2, R3, . .
,Ri−1}.three steps are involved to observe that the mask post processing is main ROI size-
adapted, because the first benchmark is the size of the maximum ROI region, and then replaced
by the size of the second, the third, and the fourth largest ROI region in turn, rather than a certain
area generated by a fixed ratio of the size of the whole image. In the latter situation, the fixed
ratio would be quite critical in deciding the main ROI set. However, it is actually hard to fix an
ideal ratio that is suitable to an arbitrary type or size of main ROIs. On the contrary, the size
of the largest ROI region readily offers a flexible and reliable reference.
Tangent Directional Lifting Wavelet Transform

TD-LWT is introduced to improve compression efficiency by reducing the energy in high-


frequency sub bands. Like NDLWT, TD-LWT exploits the rich orientation and texture in remote
sensing images, and TD-LWT shares the same wavelet transformation process with ND-LWT
except for the last step in the optimal prediction direction selection, i.e., θN = θT ± π/2. As such,
they take orthogonal directions during predicting, which results in completely opposite
processing effects. The ND-LWT’s enhancing and TD-LWT’s weakening effects to the energy of
high-frequency sub bands in the traditional LWT.

ROI Coding
1) ROI Mask Generation: After the ROI was defined; an ROI mask needs to be derived,
indicating the set of coefficients that are required for up to lossless ROI reconstruction.
We assume that it is first decomposed by a 1-D lifting transform in the vertical direction. Based
on the wavelet theory, high-frequency sub band coefficients d[∗, n] and low-frequency
Sub band coefficients c[∗, n] take turns on the row. Suppose the popular 5/3-tap bi orthogonal
wavelet filter is implemented, according to the inverse TD-LWT, c[m, n], d[m − tan θl ,
n − 1], and d[m + tan θl , n] along the direction θT, which correspond to the red blocks in Fig.
5(a), are required for the reconstruction of x[m, 2n]. Similarly, coefficients in green
blocks are required for the reconstruction of x[m, 2n + 1].Therefore, in total, eight coefficients
should be included in the ROI mask for the reconstruction of x[m, 2n] and
x[m, 2n + 1]. Particularly, if tan θT is not an integer, the coefficients used by the sinc
interpolation technique to calculate the values at fractional pixel locations should also be
added to the ROI mask. Whereas under the traditional LWT scheme, where θT is equivalent to 0,
only five coefficients in the vertical direction are needed to reconstruct x[m, 2n] and x[m, 2n +1],
as represented by the orange strips .
2) Multibit plane Alternating Shift: After coefficient quantization, to conduct ROI coding with
MAS. MAS only consider several highest bit positions presenting the great values as the
Minimum Spanning Tree (MST). The following several bit planes are general significant Bit
planes (GSB), and the least significant bit planes (LSB) are at the bottom of the binary number.
These three bit plane types act differently in ROI coding. MSB contains the most significant
information of ROI. GSB adjusts the difference on reconstruction effect between ROI and
background, making the significant information of background be soon recovered after MST is
accomplished, meanwhile assuring the recovery of ROI’s general significant information.

Algorithm 1 Mean Shift Algorithm : An image and the multiple levels m’s.

1: Smooth the image using the different parameters λ’s;

2: Obtain the multiple edge saliency maps

3: Merge the edge saliency maps and smooth it using Gaussian filter. Output: Global edge
saliency Sge.

Filtering Technical : Given an image

I 1: Calculate the local color saliency

2: Calculate the local textural saliency

3: Calculate the global color saliency

4: Calculate the global edge saliency using Algorithm 1;

5: Combination of the above four saliency, i.e., Sall = 4 i=1 w˜ i · Si;

6: Refine the final saliency through i.e., Sfinal = Sall · Fci. Output: Visual saliency final.
Proposed system

The extraction of region of interest (ROI) is preceded by the comparison, which is


divided into the following phases. Phase one- ROI is extracted interactively from the image,
dividing the image into two regions, because the radiologists are interested on the relevant areas
needed to perform correct diagnosis. Phase two- Original image, ROI and the background are
compressed with different compression algorithms, SPIHT, JPEG2000 and Adaptive SPIHT to
evaluate the one with highest quality after reconstruction. Phase three- The case is analyzed for
images containing multiple ROIs, where the priorities are set by default, in order to recover to
them at higher quality than the rest of the image, the background. Compression in the last phase
is carried out by implementing JPEG2000 on the ROI because quality after reconstruction is of
outmost importance here. All the compression techniques implemented are wavelet based. The
Fig.5.depicts the general methodology implemented in the research work.
Extracting the region

An important characteristic in all medical images is that it can be classified into two areas
easily. One area is the body part that is subject to diagnosis in the image. Another area is the
background with less important information. This work is proposed to select more than one
region of interest, according to the priority of the information required. So the first step in the
paper is segmenting the image into two regions. One approach is suggested for this. One is the
selection of the region of interest by hand and then superimposing the selected pixel matrix on an
m*n matrix of zeroes, where m and n refer to the number of horizontal and vertical pixels in the
image respectively. The background is left as such with zero values for the selected region.

Segmenting the Image data

After choosing the ROI, image is divided into multiple ROIs and Non-ROI. ROIs chosen is not
restricted to be of a square or a rectangular region, can be of any arbitrary shape. ROIs need to be
encoded with different priorities. ROI priority determines its importance in the image. The ROI
with higher priority is compressed at a higher bit-rate than the rest of the ROIs. The Non-ROI
usually has a lower priority, which is appeared in the final part of the whole image bit stream.
ROIs are sent to the encoder chosen directly. Non-ROI coefficients are sent to another encoder
chosen by the user.

ROI lossless and lossy compression

Because the extracted breast region contains important diagnosis information, it needs to be
compressed losslessly. A set of mammograms is tested with the three major compression
algorithms: SPIHT, JPEG2000, and Adaptive SPIHT.
Multiple arbitrary shape ROI compression

It introduces a ROI coding method that is able to prioritize multiple ROIs at different priorities,
guaranteeing lossy-to-lossless coding. Region Of Interest (ROI) coding is a prominent feature of
some image coding systems aimed to prioritize specific areas of the image through the
construction of a code stream that, decoded at increasing bit-rates, recovers the ROI first and
with higher quality than the rest of the image. JPEG2000 provides lossy-to lossless compression
and ROI coding, which are especially relevant to the medical community. Table 1 gives the
compression ratio comparisons with more than one ROI.

1. Analysis of Parameter k

The first set of is designed to characterize the effect of parameter k in the boundary
detection results and to provide an intuitive interpretation to the tuning of this parameter. Fig. 6
shows the boundary detection results on a synthetic image for different values of k, which results
in different values for the weighting term ω(φ,k), as the number of contours adjacent to C
increases as k increases. It is clear that there is a trade-off between the value of k and the strength
of the energy terms, i.e., the area and length terms.

2. Feature Extraction

 Intensities in a single MRI: univariate classification


 Feature vector from a single MRI: multi-variate class.
ex: [I(x,y,z) f(N(x,y,z)) g(N(x,y,z))]
where N : neighbourhood around (x,y,z)
f: distribution of I in neighborhood (entropy)
g: average I in neighborhood
or
f, g specify edge or boundary information
 Intensities in multiple MRIs with different contrast:

multi-variate (multi-spectral)
3. Frameworks for Snakes

Deformable models are curves or surfaces defined within an image domain that can move
under the influence of internal forces, which are defined within the curve or surface it, and
external forces, which are computed from the image data.

 A higher level process or a user initializes any curve close to the object boundary.
 The snake then starts deforming and moving towards the desired object boundary.
 In the end it completely “shrink-wraps” around the object.

4. Image Segmentation Level Set

 A limitation of active contours based on parametric curves of the form f(s) (snakes, b-
snake) is that it is challenging to change the topology of the curve as it evolves.
 If the shape changes dramatically, curve parameterization may also be required.
 An alternative representation for such closed contours is to use level sets (LS). – LS
evolve to fit and track objects of interest by modifying the underlying embedding
function instead of curve function f(s).

5. Comparisons to Region-Based Active Contours

The fourth set of experiments compares our method to Kimmel’s method on synthetic
images and real medical images. Let us recall that Kimmel’s method is a region-based method. It
is important to mention that region-based methods are usually based on the Mumford-Shah
functional. For example, the method of active contours without edges (Chan-Vese model) solves
the piecewise constant Mumford Shah model but restricts the solution to be a piecewise constant
solution with only two constants

Conclusion
Considering this paradigm of image understanding, the development of efficient feature
extraction methods for multispectral data analysis and classification is not an easy task, and no
general approaches for efficient classification of satellite images are provided. According to the
experimental results, common texture and color descriptors can be adapted and successfully used
for multispectral image analysis. Moreover, by combining texture and spectral features, we can
obtain more powerful descriptors. We have also achieved important results by using spectral
indices descriptors, which are very fast and easy to compute. Even though the most suitable
image descriptors for multispectral image analysis prove to be the ones based on the BoW
framework, the classical ones provide similar results with a shorter computation time. Moreover,
the Gabor-histogram descriptor, which computes both texture and spectral information, leads to
similar average accuracy rates. The goal of this letter is to provide enhanced feature extraction
methods that can be used on multispectral images for better data understanding and
classification.

• Introduced deep learning for supervised feature extraction of remote sensing images. The
proposed approach consists of using a convolutional trained with un supervised
algorithm.

• The algorithm trains the network parameters to learn hierarchical sparse representations
of the input images that can be fed to a simple classifier integrate the multi spectral and
spatial-dominated features together to construct a joint spectral–spatial classification
framework.

• Our experimental results suggest that deeper features always lead to higher classification
accuracies, though too deep structure will act inversely.

REFERENCES:

[1] L. Zhang, A. Li, Z. Zhang, and K. Yang, “Global and local saliency
analysis for the extraction of residential areas in high-spatial-resolution
remote sensing image,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 7,
pp. 3750–3763, Jul. 2016.

[2] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual


attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.

[3] L. Zhang and K. Yang, “Region-of-Interest extraction based on frequency


domain analysis and salient region detection for remote sensing
image,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 5, pp. 916–920,
May 2014.

[4] N. Imamoglu, W. Lin, and Y. Fang, “A saliency detection model using


low-level features based on wavelet transform,” IEEE Trans. Multimedia,
vol. 15, no. 1, pp. 96–105, Jan. 2013.

[5] L. Zhang, J. Chen, and B. Qiu, “Region of interest extraction in remote


sensing images by saliency analysis with the normal directional lifting
wavelet transform,” Neurocomputing, vol. 179, pp. 186–201, Feb. 2016.

[6] O. N. Gerek and A. E. Cetin, “A 2-D orientation-adaptive prediction


filter in lifting structures for image coding,” IEEE Trans. Image Process.,
vol. 15, no. 1, pp. 106–111, Jan. 2006.

[7] W. Ding, F. Wu, X. Wu, S. Li, and H. Li, “Adaptive directional liftingbased
wavelet transform for image coding,” IEEE Trans. Image Process.,
vol. 16, no. 2, pp. 416–427, Feb. 2007.

[8] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG 2000 still


image compression standard,” IEEE Signal Process. Mag., vol. 18, no. 5,
pp. 36–58, Sep. 2001.

[9] (2000). Overview of JPEG.[Online]. Available: https://jpeg.org/


jpeg2000/

You might also like