You are on page 1of 114

International Journal of Computer Science

and Business Informatics


(IJCSBI.ORG)

ISSN: 1694-2507 (Print)


VOL 13, NO 1
ISSN: 1694-2108 (Online) MAY 2014
IJCSBI.ORG
Table of Contents VOL 13, NO 1 MAY 2014

A Novel Facial Recognition Method using Discrete Wavelet Transform Multiresolution Pyramid .......... 1
G. Preethi

Enhancing Energy Efficiency in WSN using Energy Potential and Energy Balancing Concepts ................. 9
Sheetalrani R. Kawale

DNS: Dynamic Network Selection Scheme for Vertical Handover in Heterogeneous Wireless Networks
.................................................................................................................................................................... 19
M. Deva Priya, D. Prithviraj and Dr. M. L Valarmathi

Implementation of Image based Flower Classification System ................................................................ 35


Tanvi Kulkarni and Nilesh. J. Uke

A Survey on Knowledge Analytics of Text from Social Media .................................................................. 45


Dr. J. Akilandeswari and K. Rajalakshm

Progression of String Matching Practices in Web Mining A Survey ..................................................... 62


Kaladevi A. C. and Nivetha S. M.

Virtualizing the Inter Communication of Clouds ............................................................................... 72


Subho Roy Chowdhury, Sambit Kumar Patel, Ankita Vinod Mandekar and G. Usha Devi

Tracing the Adversaries using Packet Marking and Packet Logging ....................................................... 86
A. Santhosh and Dr. J. Senthil Kumar

An Improved Energy Efficient Clustering Algorithm for Non Availability of Spectrum in Cognitive Radio
Users ....................................................................................................................................................... 101
V. Shunmuga Sundaram and Dr S. J. K Jagadeesh Kumar
International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A Novel Facial Recognition Method


using Discrete Wavelet Transform
Multiresolution Pyramid
G. Preethi
PG Scholar, Department of CSE,
Chendhuran College of Engineering & Technology,
Pudukkottai 622507, India

ABSTRACT
Necessity for the facial recognition methods is increasing now-a-days as large number of
applications need it. While implementing the facial recognition methods the cost of data
storage and data transmission plays a vital role. Hence facial recognition methods require
image compression techniques to full fill the requirements. Our paper is based on the
discrete wavelet transform multiresolution pyramid. Various resolutions of the original
image with different image qualities can be had without employing any image compression
techniques. Principal Component Analysis is used to measure the facial recognition
performance using various resolutions of the image. Facial images for testing are selected
from standard FERET database. Experimental results show that the low resolution facial
images also performs equal to the higher resolution images. So instead of using all the
available wavelet coefficients, the minimum number of coefficients representing the lower
resolution can be used and there is no need of image compression.
Keywords
Principal component analysis, discrete cosine transform, discrete wavelet transform,
support vector machine words.

1. INTRODUCTION
Facial recognition methods are used to identify or verify an individual using
the facial images already enrolled in a database. The general categories of
facial recognition are holistic, feature-based, template-based and part-based
methods. Among them holistic method requires the whole face region as
input and utilizes its statistical moments. The basic and commonly used
holistic methods are based on Principal Component Analysis (PCA) [1].
Facial recognition methods are used in large number of applications like e-
visa, e-passport, entry control in organizations, criminal identification,
forensic science, smart phones and laptops for authentication etc. The
number of facial images to be stored increases the problems like data
storage and the cost of transmitting images. As a solution to reduce both
data storage and cost of transmission, image compression algorithms are
utilized.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 1


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Efficient image compression can be achieved using transform based
methods than the pixel based methods. Transform coding transforms the
given image from spatial domain to transform domain where efficient
compression can be carried out. Since the transformation is a linear process,
there will not be any loss of information and the number of coefficients
equals the number of pixels. As most of the images energy is concentrated
within a few large magnitude coefficients, the remaining very small
magnitude coefficients can be coarsely quantized or even ignored while
encoding. This will not affect the quality of the reconstructed image more.
The available mathematical transforms are Karhunen-Loeve (KLT),
Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and
Discrete Wavelet Transform (DWT) [2]. Among them DCT is utilized in
large applications like JPEG and MPEG. Now DWT is replacing the DCT
by its superior quality and various decoding options. Transforms which
operates on the whole image instead of image blocks can avoid blocking
artifacts at low compression rates. DWT decomposes the source signal into
non-overlapping and contiguous frequency ranges called sub bands. The
source sequence is fed to a bank of band pass filters which are contiguous
and cover the full frequency range. This set of output signals are the sub
band signals and can be recombined without degradation to produce the
original signal [3] [4]. Fig.1 shows how a signal is separated into sub bands
using band pass filters.

h1 2

x(n)

h1 2

h0 2

h0 2

Figure 1. Sub band decomposition of a signal


When transforming a two dimensional digital image using the band pass
(low pass and high pass) filters, it requires the first transform along
horizontal axis and the second one along vertical axis to decompose the
image into sub bands. The resulting four sub bands are named as LL, LH,
HL and HH of a one level decomposition. LL, LH, HL and HH represents

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 2


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
lowest frequencies, vertical high frequencies (horizontal edges), horizontal
high frequencies (vertical edges) and high frequencies in both directions (the
comers) respectively. Fig.2 shows various sub bands separated by a three
level dyadic DWT [5].

Figure 2. Sub bands separated by a three level dyadic DWT.


The multiresolution property [6] of DWT enables the user to have variable
resolutions of the transformed image. While reconstructing the image, for a
3 level transformation, four resolutions (0 to 3) are possible. The LL3 sub
band can reconstruct 0th resolution, LL3, HL3, LH3 and HH3 sub-bands
can reconstruct 1st resolution, LL3, HL3, LH3, HH3, HL2, LH2 and HH2
sub-bands can reconstruct 2nd resolution and LL3, HL3, LH3, HH3, HL2,
LH2, HH2, LL1, HL1, LH1 and HH1 sub-bands can reconstruct the third
resolution.
When an image of dimension 128 x 128 pixels is transformed by DWT for 3
levels, the LH1, HL1, LL1 and HH1 will have a dimension of 64 x 64
pixels. LH2, HL2, LL2 and HH2 are of 32 x 32 pixels and LH3, HL3, LL3
and HH3 will have a dimension of 16 x 16 pixels. Hence the resolution 0
requires 256 (16 x 16) wavelet coefficients, 1 requires 1024 (32 x 32)
wavelet coefficients, 2 needs 4096 (64 x 64) wavelet coefficients and 3
requires the whole 16384 (128 x 128) wavelet coefficients. With this
multiresolution feature of the DWT, we propose a novel facial recognition
method where the available resolutions of the facial image are used instead
of the whole image.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 3


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2. MATERIALS AND METHODS
We briefly explain about the FERET database, PCA and the performance
measure Recognition Rate here.
2.1 Database
FERET database is a standard database for testing facial recognition
algorithms. This database is collected by Defense Advance Research
Projects Agency (DARPA) and the National Institute of Standards and
Technology (NIST) of United States of America (USA) from 1993 to 1997
[7]. The total collection counts to 14051 grayscale facial images. Images
are categorized into various groups depending upon the nature as Fa, Fb, Fc,
Dup I and Dup II with 1196, 1195, 194, 722 and 234 images respectively.
Moon and Philips [8] have analysed the computation and performance
aspects of PCA based face recognition using Feret database.
2.2 Image Types
There are three types of images: Gallery images are the collection of facial
images from known individuals which forms the search dataset. Probe
images are the collection facial images of unknown persons to be identified
or verified by matching the gallery images. Training images are the random
collection facial images from all the available categories. These training
images are used to train the PCA algorithm for facial recognition.
2.3 Principal Component Analysis
An applied linear algebra tool used for dimensionality reduction of the given
data set. It decorrelates the second-order statistics of the data. A 2-D facial
image is converted into a single dimensional vector by joining all the rows
one after another having r (row) x c (columns) elements. For M training
images, there will be M single dimensional vectors. A mean centered image
is calculated by subtracting the mean image from each vector. Based on the
covariance matrix of the mean centered image, Eigen vectors are computed.
The basis vectors which represent the maximum variance direction from the
original image are selected as feature vectors. These feature vectors are
named as Eigen faces or face space. It is not necessary that the number of
feature vectors should be equal to the number of training images. Every
image in the gallery image set is projected into the face space and the
weights are stored in the memory. The face to be probed is also projected
into the face space. The distance between the projected probe image
weights and every projected gallery image weight is computed. The gallery
image having the shortest distance will be treated as the recognized face.
Many PCA based face recognition methods are available. Hybrid versions
of PCA and other methods like Gabor wavelets [9], Support Vector
Machine (SVM) Classifiers [10], etc. are used for face recognition.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 4


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.4 Distance Measure
The distance measures are used to compare the similarity between the probe
and gallery images. The distance measure used in our work is L1. Let x
and y are two vectors of size n and d is the distance between the vectors x
and y. L1 distance or City-Block or Manhattan distance is defined as the
sum of the absolute differences between these two vectors x and y. L1
distance is given in the following equation:

2.5 Performance Measure - Recognition Rate (RR)


We adopted the performance measure from Delac et. al. [12]. The
recognition rate is defined as the ratio between the number of probe images
recognized correctly and the total number of probe images used for
recognition. Both the gallery and probe images are projected in the face
space and the individual similarity score of the probe images are calculated.
Distance measure is used to find out the gallery image having higher
similarity with the probe image. If the identified gallery image is exactly
equal to the probe image then it is declared that it is correctly identified. For
example out of 1000 probe images if 786 are correctly identified than the
RR is 786/1000 = 78.6%.

3. PROPOSED METHOD
Facial image sets of Fa, Fb, Fc, Dup I and Dup II from FERET database are
normalized as per the ISO/IEC 19794-5 standard for facial image data using
the algorithm of Somasundaram and Palaniappan [12]. From the resultant
images of the normalization method, the facial features region (area
covering eyes, nose, mouth) is segmented to the dimension of 128 x 128
pixels. Few of the test images are shown in Fig.3.

Figure 3. Few segmented test images from FERET database


Every segmented facial image is de-noised using median filter and the
intensity values are equalized using histogram equalization. These images
are transformed using DWT with Cohen-Daubechies-Feauveau 9/7
(CDF9/7) filter for 3 levels. The wavelet coefficients of LL3 (16 x 16) are
used for the reconstruction of resolution 0. Wavelet coefficients of LH3,

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 5


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
HL3, LL3 and HH3 (32 x 32) are used for the reconstruction of resolution 1.
All the wavelet coefficients except LH1, HL1 and HH1 (64 x 64) are used to
reconstruct resolution 2. Whole wavelet coefficients representing all the
levels (128 x 128) are used to reconstruct resolution 3. Fig.4 shows the
various resolutions available.

Resolution 0 Resolution 1 Resolution 2 Resolution 3


Figure 4. Various resolutions available for a 3 level DWT decomposition
The FERET image set Fa is used as gallery image set. Sets Fb, Fc, Dup I
and Dup II are used as probe image sets. A training set of 501 images from
FERET data set obtained from the CSU Face Identification Evaluation
System of Colorado State University is used in our experiment. Among
these training images 80% are from gallery images and 20% from Dup I
images. While performing PCA on the training set, it generates 500 Eigen
vectors. Among these 500 Eigen vectors only the top 200 Eigen vectors
(40% of the total Eigen vectors) are selected as basis vectors. These basis
vectors are used with PCA algorithm to generate the PCA face space
(WPCA).
We performed two types of experiments where in the first experiment the
training and gallery images are of resolution 3 and only the probe images
are varied from resolution 3 to resolution 0. For the second experiment all
the gallery and probe images are varied from resolution 3 to 0. These two
experiments are carried over for every individual probe sets Fb, Fc, Dup I
and Dup II. Initially the face spaces are generated using PCA using training
images for every resolution. While carrying out the experiments the gallery
and probe images are projected to the respective face space as per the
requirement. The L1 distance measure is used to find the similarity scores
of the gallery images.
4. RESULTS AND DISCUSSION
The FERET facial images are transformed using DWT using Matlab
(Version 7) software. The PCA face space generation, projection of gallery,
probe image and similarity score computation are also carried out using
Matlab programs. For every experiment the recognition rates are individual
calculated for every probe image using all the resolution levels.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 6


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4.1 Experiment 1
The recognition rates of the probe image sets Fb, Fc, Dup I and Dup II for
the resolution levels 3, 2, 1 and 0 with the gallery and training images of
resolution 3 are given in Table 1.
Table 1. Recognition rate for resolution 3 training and gallery images
Image Type Recognition Rate (%)
Res-3 Res-2 Res-1 Res-0
Fb 86.78 86.78 86.61 81.92
Fc 38.66 37.63 32.47 25.77
Dup I 41.83 41.69 40.58 35.73
Dup II 19.66 19.23 18.80 14.96

For Fb image set the resolutions 3,2 and 1, the RR is more or less equal and
the resolution 0 decreases much. For all the resolution levels 3 to 0, the RR
drops significantly in Fc image sets. In the image sets Dup I and Dup II also
the RR resembles the image set Fc. As an overall observation the RR drops
significantly as the resolution decreases.
4.2 Experiment 2
The recognition rates of the probe image sets Fb, Fc, Dup I and Dup II for
the resolution levels 3, 2, 1 and 0 with the gallery and training images of the
same resolution level are given in Table 2.
Table 2. Recognition rate for all the resolutions
Image Type Recognition Rate (%)
Res-3 Res-2 Res-1 Res-0
Fb 86.78 88.03 88.77 88.87
Fc 38.66 41.75 41.24 42.27
Dup I 41.83 42.11 41.13 40.44
Dup II 19.66 20.09 19.52 18.80
When the training, gallery and probe image sets belong to the same
resolution give better results than the first experiment. For Fb image set the
RR increases for resolutions 3, 2, 1 and 0 steadily. The RR of resolutions 2
to 0 differ by a minimum of 1.25% from the resolution 3. The RR of Fc
shows a good difference between the resolution 3 and others. Even the
resolution 3 differs by 3.5% with resolution 0. For the image sets Dup I and
Dup II the RR increases for resolution 2 from 3, but decreases for resolution
1 and 0 than the resolution 3.
Based on the results of the above two experiments, it is evident that the
facial recognition rates of the lower resolution also equals the higher
resolution. So instead of using the overall wavelet coefficients a minimum

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 7


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
number of coefficients which can give higher recognition rate can be used
without using any image compression.

5. CONCLUSIONS
Our proposed method presents a facial recognition algorithm based on the
resolution scalability of DWT using PCA. The lower resolution images
require very low bit rate when compared to higher resolution images. But
the lower resolution images give recognition rate more or less equal to the
higher resolution images. This can save the cost of transmission time and
data storage. Our method can fulfill the requirements of a basic facial
recognition with low resolution images.

REFERENCES
[1] Turk, M.A., and Pentland, A.P. Face Recognition using Eigenfaces, IEEE Conference
on Computer Vision and Pattern Recognition, (1991), 586-591.
[2] Salamon, D. Data Compression The Complete Reference, Second Edition, Springer-
Verlag., 2000.
[3] Robi Polikar, The Wavelet Tutorial, http://users.rowan.edu/~polikar/WAVELETS
[4] Wavelet Theory, Department of Cybernetics,http:\\cyber.felk.cvut.cz.
[5] William, A., Pearlman, and Amir Said, Digital Signal Compression Principles and
Practice, Cambridge University Press, 2011.
[6] Mallat and Stephane, G. A Theory of Multiresolution Signal Decomposition: The
Wavelet Representation, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 11, 7(1989), 674-693.
[7] Grayscale FERET Database. http://www.itl.nist.gov/iad/humanid/feret/
[8] Moon, H., and Phillips, P.J. Computational and Performance Aspects of PCA-based
Face Recognition Algorithms, Perception, 30 (2001), 303-321.
[9] Cho, H., Roberts, R., Jung, B., Choi, O., and Moon, S. An Efficient Hybrid Face
Recognition Algorithm Using PCA and GABOR Wavelets. International Journal of
Advanced Robotic Systems, 11, 59 (2014), 1-8.
[10] Xu, W., and Lee, E. J. Face Recognition Using Wavelets Transform and 2D PCA by
SVM Classifier, International Journal of Multimedia and Ubiquitous Engineering, 9, 3
(2014), 281-290
[11] Delac, K., Grgic, M., and Grgic, S. Face recognition in JPEG and JPEG2000
Compressed Domain, Image and Vision Computing, 27 (2009), 1108-1120.
[12] Somasundram, K., and Palaniappan, N. Personal ID Image Normalization using
ISO/IEC 19794-5 Standards for Facial Recognition Improvement, Communications in
Computer and Information Science Series, Springer Verlag, 283 (2012), 429-438.

This paper may be cited as:


Preethi, G. 2014. A Novel Facial Recognition Method using Discrete
Wavelet Transform Multiresolution Pyramid. International Journal of
Computer Science and Business Informatics, Vol. 13, No. 1, pp. 1-8.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 8


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Enhancing Energy Efficiency in


WSN using Energy Potential and
Energy Balancing Concepts
Sheetalrani R. Kawale
Assistant Professor, Department of Computer Science
Karnataka State Womens University, Bijapur

ABSTRACT
There are much different energy aware routing protocols proposed in the literature, most of
them focus only on energy efficiency by finding the optimal path to minimize energy
consumption. These protocols should not only aim for energy efficiency but also for energy
balance consumption. In this work, energy balanced data gathering routing algorithm is
developed using the concepts of potential in classical physics [16]. Our scheme called
energy balanced routing protocol, forwards data packets toward the sink through dense
energy areas so as to protect the nodes with relatively low residual energy. This is to
construct three independent virtual potential fields in terms of depth, energy density and
residual energy. The depth field is used to establish a basic routing paradigm which helps in
moving the packets towards the sink. The energy density field ensures that packets are
always forwarded along the high energy areas. Finally, the residual energy field aims to
protect the low energy nodes. An energy-efficient routing protocol, tries to extend the
network lifetime through minimizing the energy consumption whereas energy balanced
with efficiency routing protocol intends to prolong the network lifetime through uniform
energy consumption with efficiently.

Keywords
Sensor networks, energy efficient routing, potential fields, low energy nodes.

1. INTRODUCTION
Recent development in wireless technology has enabled the development of
low power, multifunctional sensor nodes that are in small size and
communicate in small distances. This tiny sensor node, which consists of
sensing, data processing and communicating components, leverage the idea
of sensor networks. A sensor network is composed of a large number of
sensor nodes that are densely deployed either inside the phenomenon or
very close to it. The positions of these sensor nodes can be easily engineered
to be either fixed to a particular location or have certain amount of mobility
in a predefined area. [24][25]

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 9


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2. BACKGROUND STUDY
The sensing or monitoring of for example temperature, humidity etc.,
constitutes one of the two main tasks of each sensor. The other main task is
packet forwarding using the equipped wireless technology. Whichever way
data is transmitted the network must provide a way of transporting
information from different sensors to wherever this information is needed.
Sensor networks could be deployed in a wide variety of application domains
such as military intelligence, commercial inventory tracking and agricultural
monitoring [22][23][24].
Each node stores the identity of one or more nodes through which it heard
an announcement that another group exists. That node may have itself heard
the information second-hand, so every node within a group will end up with
a next-hop path to every other group, as in distance-vector. Topology
discovery proceeds in this manner until all network nodes are members of a
single group. By the end of topology discovery, each node learns every
other nodes virtual address, public key, and certificate, since every group
members knows the identities of all other group members and the network
converges to a single group.

3. EXISTING SYSTEM
The existing system focus on energy efficient routing whose target is to find
an optimal path to minimize energy consumption on local nodes or in the
whole WSN [17][18][19]. The energy aware routing maintains multiple
paths and properly chooses one for each packet delivery to improve network
survivability. It may be quite costly since indeed to exchange routing
information very frequently and may result in energy burden and traffic
overload for the nodes.

4. PROBLEM IDENTIFICATION
Energy is an important resource for battery-powered wireless sensor
networks (WSN) that makes energy-efficient protocol design a key
challenging problem. The three main reasons that can cause an imbalance in
energy distribution:
Topology: The topology of the initial deployment limits the number
of paths along which the data packets can flow. For example, if there
is only a single path to the sink, nodes along this path would deplete
their energy rather quickly. In this extreme case, there are no ways to
reach an overall energy balance.

Application: The applications themselves will determine the


location and the rate at which the nodes generate data. The area

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 10


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
generating more data and the path forwarding more packets may
suffer faster energy depletion.

Routing: Most energy-efficient routing protocols always choose a


static optimal path to minimize energy consumption which results in
energy imbalance since the energy at the nodes on the optimal path
is quickly depleted.

5. SYSTEM DESIGN DESCRIPTION


5.1 EBERP: Energy Balanced with Efficiency Routing Protocol:
The goal of Energy Balanced with Efficiency Routing Protocol is to force
the packets to move towards the sink so that the nodes with relatively low
residual energy are protected. The Energy Balanced with Efficiency Routing
Protocol is designed by constructing a mixed virtual potential field. It forces
packets to move towards the sink through dense energy area. It protects the
sensor nodes with low residual energy. Successfully delivers the sensed
packet to the sink. Result shows significant improvement in network
lifetime, coverage ratio and throughput.
This article focuses on routing that balances the energy consumption with
efficiency. Its main contributions are:
The concept of potential in classical physics is referred to build a
virtual hybrid potential field to drive packets to move towards the
sink through the high energy area and steer clear of the nodes with
low residual energy so that the energy is consumed as evenly as
possible in any given arbitrary network deployment.
Classify the routing loops and devise an enhanced mechanism to
detect and eliminate loops. The simulation results reflect that the
proposed solution for EBERP makes significant improvements in
energy consumption balance, network lifetime and throughput when
compared to the other commonly used energy efficient routing
algorithm.
An energy-efficient routing protocol, tries to extend the network lifetime
through minimizing the energy consumption whereas energy balanced with
efficiency routing protocol intends to prolong the network lifetime through
uniform and efficient energy consumption. The former readily results in the
premature network partition that disables the network functioning, although
there may be much residual energy left. On the other hand, the latter may
not be optimal with respect to energy efficiency as it can burn energy evenly
to keep network connectivity and maintain network functioning as long as
possible. Let us use a simple example to demonstrate what uneven energy
depletion results in and how the proposed scheme Energy Balanced with

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 11


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Efficiency Routing Protocol (EBERP) works to balance energy
consumption with efficiently.
In this system, energy balanced data gathering routing algorithm is
developed using the concepts of potential in classical physics. Our scheme
called energy balanced routing protocol, forwards data packets toward the
sink through dense energy areas so the nodes with relatively low residual
energy can be protected. The cornerstone of the EBERP is to construct three
independent virtual potential fields in terms of energy density, depth and
residual energy. The depth field is used to establish a basic routing paradigm
which helps in moving the packets towards the sink. The energy density
field ensures that packets are always forwarded along the high energy areas.
Finally, the residual energy field aims to protect the low energy nodes and
the energy is balanced efficiently.
5.2 Depth of Potential Field
To provide the basic routing function, namely to instruct packets move
toward the sink, we define the inverse proportional function of depth as the
depth potential field Vd(d) as shown in Eq. 5.1:

Where d =D (i) denotes the depth of node i. Then, the depth potential
difference Ud (d1; d2) from depth d1 to depth d2 is given by Eq 5.2

Since the potential function Vd(d) is monotonically decreasing, when the


packets in this depth potential field move along the direction of the gradient,
they could reach the sink eventually and the basic routing function can be
achieved. For a given network topology, Vd(d) is definite and time
invariant. Moreover, when the data packets move closer to the sink, the
centrality should be larger, where the centrality denotes the trend that a node
in depth d forwards the packets to the neighbors in depth d-1.

Figure 1. Depth potential field

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 12


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Energy Density Potential Field
A node adds up the energy values of all its neighbors, which can be obtained
through messages exchanged among nodes and calculates the area of the
radio coverage disk, so that the corresponding energy density can be readily
obtained using the aforementioned definition. EBERP defines the energy
density potential field as shown in Eq. 5.3 as follows:

Where Ved(i; t) is the energy density potential of node i at time t,


and ED(i; t) is the energy density on the position of node i at time t. Thus,
the potential difference Ued(i; j; t)from node i to node j is given by Eq. 5.4

Driven by this potential field, the data packets will always flow
toward the dense energy areas. However, with only this energy density field,
the routing algorithm is not practical since it would suffer from the serious
problem of routing loops. This fact will be clarified in the subsequent
simulation experiments.
Energy Potential Field
EBERP defines an energy potential field as shown in Eq. 5.5 using the
residual energy on the nodes in order to protect the nodes with low energy:

Where Ve (i; t) is the energy potential of node i at time t, and E(i; t)


is the residual energy of node i at time t. Then potential difference Ue (i; j; t)
from node i to j is derived as shown in Eq 5.6.

The two latter potential fields are constructed using the linear
functions of energy density and residual energy, respectively. Although the
properties of the linear potential fields are straightforward, both of them are
time varying, which will result in the routing loop.
6. PERFORMANCE EVALUATION
In this section protocols are evaluated by simulation. It illustrates the
advantages of our protocol along with Mint Route protocol which uses the
shortest path for transfer of packets from source to sink.
6.1 Performance Metrics
To make a performance evaluation, several measurable metrics has
to be defined.
Network Lifetime
The network lifetime [16] of a sensor network is defined as the time
when the first energy exhausted node (First Dead Node, FDN) appears. The
network lifetime is closely related to the network partition and network

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 13


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
coverage ratio. When a node begins to die, the probability of network
partition increases and the network coverage ratio might reduce.
Functional lifetime
The functional lifetime of a task is defined as the amount of time
that the task is perfectly carried out. Different tasks have different
requirements. Some tasks may require no node failure while some others
just need a portion of nodes to be alive, therefore the function lifetime may
vary much according to task requirements. In simulation experiments,
requirements are based on the application by making all the sampling nodes
alive, functional lifetime is defined as the interval between the beginning of
task and the appearance of the First Dead Sampling Node (FDSN).
Functional Throughput (FT)
Functional throughput is defined as the number of packets thatthe
sink receives during the functional lifetime. For a given application, FT is
mainly influenced by the length of the functional lifetime
6.2 Simulation Setup
The simulation experiments in wireless sensor networks are conducted and
evaluated to get the performance of our EBERP and compare them with
Mint Route algorithm. In this special topology, a node can only
communicate with its direct neighbors. The node can act as either a
sampling node or a relaying node depending on the requirements. The nodes
in the event areas can execute sampling and relaying tasks. The same
simulation is repeated by deploy in n number of nodes with a maximum of
1000 nodes, the average values of the performance metrics are calculated.

6.3 Performance Results.


In order to evaluate the relative performance of proposed protocol,
the protocol is compared with the existing Mint Route protocol. The graph
shown in the fig 3 will give a comparison result of how well the energy is
balanced for routing in our proposed scheme.

Figure 2. Comparison results for EBERP and Mint Route routing


Network Lifetime and Network Throughput

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 14


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Mint Route always chooses the shortest path, thus it will burn out the energy
of nodes on that path quickly. However, EBERP will choose another path
through other areas with more energy once it finds out that the energy
density in this area is lower than that in other areas nearby. Therefore,
EBERP can improve the energy consumption balance across the network
and prolong the network lifetime as well as the functional lifetime. The
statistical results are listed in table 8.1 shows the network throughput. The
EBERP prolongs the time of FDN. The functional throughput is and
network lifetime is also improved. The statistics listed in the table 8.2 show
the results of network lifetime. From these results, conclusion can be drawn
that more gain can be obtained through the EBERPs energy consumption
balance and the integrity of the data received in EBERP is much better than
that in Mint Route since there is fewer packets loss in EBERP.

2
1 Mint Routing-
0 Network
1 2 3 4 5 6 7 8 Throughput

Figure 3. Network Throughput

0.5
X- axis Total
0 number of
1
74
147
220
293
366
439
512
585
658
731

packets sent

Figure 4. Network Lifetime


6.4 Summary
The performance evaluation chapter discusses about the simulation results
drawn by considering all the performance metrics parameters like functional
lifetime, network lifetime and network throughput. The comparison
performance graph along with the network throughput and network lifetime
graph gives a clear overview of the existing and proposed protocols being
implemented.

7. CONCLUSION AND FUTURE ENHANCEMENT


7.1 Conclusion
Energy is an important resource for battery-powered wireless sensor
networks (WSN) that makes a key challenging problem for designing
energy-efficient protocol. Most of the existing energy efficient routing
protocols usually forward packets through the minimum energy path to the

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 15


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
sink that merely minimizes energy consumption which leads unbalanced
distribution of residual energy amongst sensor nodes. Only, saving energy is
not enough to effectively prolong the network lifetime. The uneven energy
depletion often results in network partition and low coverage ratio which
decrease the performance. This article focuses on routing that balances the
energy consumption with efficiently. Its main contributions are firstly,
referring the concept of potential in classical physics to build a virtual
hybrid potential field to drive packets to move towards the sink through the
high energy area and steer clear of the nodes with low residual energy so
that the energy is consumed as evenly as possible in any given arbitrary
network deployment. Then, classify the routing loops and devise an
enhanced mechanism to detect and eliminate loops. The simulation results
reflect that the proposed solution for EBERP makes significant
improvements in energy consumption balance, network lifetime and
throughput when compared to the other commonly used energy efficient
routing algorithm.
7.2 Future Enhancement
In this project the routing loops: one hop - loop, origin - loop and queue -
loop are being detected and eliminated by cutting the loop. Hence, future
enhancement can be done in detecting and eliminating the loops and
transmitting packets by avoiding the loops. It will further help in improving
the overall system performance.
8. ACKNOWLEDGMENTS
This research would not have been possible without the help of my research
guide Dr. Mahadavan, Mr. Aziz Makandar, who gladly provided me with
the required information and equipment so that I could complete
myresearch. I would also like to thank our VC Dr. Meena R. Chandawarkar
who motivated me to take this work and for providing moral support.

REFERENCES
[1] Andrew S. Tanenbaum, Computer Networks, Prentice Hall of India Publications, 4th
Edition, 2006.
[2]Carlos Golmez, Joseph Padelles, Sensors Everywhere,Prentice Hall of India Publication,
4th Edition.
[3] J. Evans, D. Raychaudhuri, and S. Paul, Overview of Wireless, Mobile and Sensor
Networks in GENI, GENI Design Document 06- 14, Wireless Working Group, 2006.
[4] S. Olariu and I. Stojmenovi, Design Guidelines for Maximizing Lifetime and Avoiding
Energy Holes in Sensor Networks with Uniform Distribution and Uniform Reporting,
Proc. IEEE INFOCOM, 2006.
[5] Ian F. Akyildizon, Weilian Su, Yogesh Sankasubramanium and Erdal Cayirici, A
Survey on Sensor Networks, IEEE communications Magazine, August 2002, pp 102 114.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 16


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[6] H. Zhang and H. Shen, Balancing Energy Consumption to Maximize Network
Lifetime in Data-Gathering Sensor Networks, IEEE Trans. Parallel and Distributed
Systems, vol. 20, no. 10, pp. 1526-1539, Oct. 2009.
[7] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, Energy- Efficient
Communication Protocols for Wireless Microsensor Networks, Proc. Hawaiian Intl Conf.
Systems Science, 2000.
[8] O. Younis and S. Fahmy, HEED: A Hybrid, Energy-Efficient Distributed Clustering
Approach for Ad Hoc Sensor Networks, IEEE Trans. Mobile Computing, vol. 3, no. 4, pp.
366-379, Oct.-Dec. 2004.
[9] M. Singh and V. Prasanna, Energy-Optimal and Energy-Balanced Sorting in a Single-
Hop Wireless Sensor Network, Proc. First IEEE Intl Conf. Pervasive Computing and
Comm., 2003.
[10] H. Lin, M. Lu, N. Milosavljevic, J. Gao, and L.J. Guibas, Composable Information
Gradients in Wireless Sensor Networks, Proc. Seventh Intl Conf. Information Processing
in Sensor Networks (IPSN), pp. 121-132, 2008.
[11] Y. Xu, J. Heidemann, and D. Estrin, Geography-Informed Energy Conservation for
Ad-Hoc Routing, Proc. ACM MobiCom, 2001.
[12] V. Rodoplu and T.H. Meng, Minimum Energy Mobile Wireless Networks, IEEE J.
Selected Areas in Comm., vol. 17, no. 8, pp. 1333- 1344, Aug. 1999.
[13] W. Heinzelman, J. Kulik, and H. Balakrishnan, Adaptive Protocols for Information
Dissemination in Wireless Sensor Networks, Proc. ACM MobiCom, 1999.
[14] D.H. Armitage and S.J. Gardiner, Classical Potential Theory. Springer, 2001.
[15] C. Schurgers and M. Srivastava, Energy Efficient Routing in Wireless Sensor
Networks, Proc. Military Comm. Conf. (MILCOM), 2001.
[16]K. Kalpakis, K. Dasgupta, and P. Namjoshi, Maximum Lifetime Data Gathering and
Aggregation in Wireless Sensor Networks, Proc. IEEE Intl Conf. Networking (ICN), pp.
685-696, 2002.
[17] AmolBakshi, Viktor K.Prasanna, Energy-Efficient Communication in Multi-Channel
Single-Hop Sensor Networks.Proceeding ICPADS 04 Proceedings of the Parallel and
Distributed Systems, Tenth International Conference, page 403.
[18] Jing Wang , Dept. of ECE, North Carolina Univ., Charlotte, NC ,Power Efficient
Stochastic - Wireless Sensor Networks. Wireless Communications and Networking
Conference,2006.WCNC2006.IEEE, page 419-424.
[19] Li Hong , Shu-Ling Yang,An Overall Energy-Balanced Routing Protocol for Wireless
Sensor Network. Information and Automation for Sustainability, 2008.ICIAFS 2008. 4th
International Conference, page 314-318.
[20] Lin Wang, Ruihua Zhang, Shichao Geng,An Energy-Balanced Ant-Based Routing
Protocol for Wireless Sensor Networks. Wireless Communications, Networking and
Mobile Computing, 2009.WiCom '09. 5th International Conference on, page 1-4.
[21] Talooki, Marques. H, Rodriguez. J, Aqua, H. Blanco. N, Campos. L,An Energy
Efficient Flat Routing Protocol for Wireless Ad Hoc Networks. Computer
Communications and Networks (ICCCN), 2010 Proceedings of 19th International
Conference, page 1-6.
[22] http://en.wikipedia.org/wiki/Wireless_network.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 17


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[23] http://oretan2011.wordpress.com/2011/01/28/wireless-sensor-network-wsn/
[24] http://en.wikipedia.org/wiki/WSN

This paper may be cited as:


Kawale, S. R., 2014. Enhancing Energy Efficiency in WSN using Energy Potential and Energy
Balancing Concepts. International Journal of Computer Science and Business
Informatics, Vol. 13, No. 1, pp. 9-18.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 18


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

DNS: Dynamic Network Selection


Scheme for Vertical Handover in
Heterogeneous Wireless Networks
M. Deva Priya
Department of CSE, Sri Krishna College of Technology,
Coimbatore, India.

D. Prithviraj
Department of CSE, Sri Krishna College of Technology,
Coimbatore, India.

Dr. M. L Valarmathi
Department of CSE, Government College of Technology,
Coimbatore, India.

ABSTRACT
Seamless Service delivery in a heterogeneous wireless network environment demands
selection of an optimal access network. Selecting a non-promising network, results in
higher costs and poor services. In heterogeneous networks, network selection schemes are
indispensable to ensure Quality of Service (QoS). The factors that have impact on network
selection include Throughput, Delay, Jitter, Cost and Signal Strength. In this paper, multi-
criteria analysis is done to select the access network. The proposed scheme involves two
schemes. In the first scheme, Dynamic Analytic Hierarchy Process (AHP) is applied to
dynamically decide the relative weights of the evaluative criteria set based on the user
preferences and service applications. The second scheme adopts Modified Grey Relational
Analysis (MGRA) to rank the network alternatives with faster and simpler implementation.
The proposed system yields better results in terms of Throughput, delay and Packet Loss
Ratio (PLR).
Keywords
Multi-Criteria Decision Making (MCDM) Scheme, Analytic Hierarchy Process (AHP),
Grey Relational Analysis (GRA), WiMAX, WiFi, QoS.
1. INTRODUCTION
Rapid development of multimedia applications in the wireless environment
has led to the development of many broadband wireless technologies. IEEE
802.16, a standard proposed by IEEE for Worldwide Interoperability for
Microwave Access (WiMAX) suggests modifications to the Medium
Access Control (MAC) and Physical (PHY) layers to efficiently handle high
bandwidth applications. IEEE 802.16 standards ensure Quality of Service

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 19


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
(QoS) for different types of applications supporting different types of
service classes[1].
1.1 IEEE 802.16 - WiMAX
IEEE 802.16, a solution to Broadband Wireless Access (BWA) is a wireless
broadband standard that promises high bandwidth over long range of
coverage[2]. The IEEE 802.16-2001 standard specified a frequency range
from 10 to 66 GHz with a theoretical maximum bandwidth of 120 Mbps and
a maximum transmission range of 50 kms. The initial standard supported
only the Line-Of-Sight (LOS) transmission and did not favor deployment in
urban areas.
IEEE 802.16a-2003 supports Non-LOS (NLOS) transmission and supports a
frequency range of 2 to11 GHz. IEEE 802.16 standard underwent several
amendments and evolved to the 802.16-2004standard (also known as
802.16d). It provided technical specifications to the PHY and MAC layers
for fixed wireless access and addresses the first or last mile connection in
Wireless Metropolitan Area Networks (WMANs).
IEEE 802.16e added mobility support. This is generally referred to as
mobile WiMAX and adds significant enhancements as listed below.
It improves the NLOS coverage using advanced antenna diversity
schemes and Hybrid Automatic Repeat Request (HARQ).
It adopts dense Subchannelization, thus increasing system gain and
improving indoor penetration.
It uses Adaptive Antenna System (AAS) and Multiple Input Multiple
Output (MIMO) technologies to improve coverage.
It introduces a DL Subchannelization scheme enabling better
coverage and capacity trade-off. This brings potential benefits in
terms of coverage, power consumption, self-installation and
frequency reuse and bandwidth efficiency.
With the rising popularity of multimedia applications in the Internet, IEEE
802.16 provides the capability to offer new wireless services such as
multimedia streaming, real-time surveillance, Voice over IP (VoIP) and
multimedia conferencing. Due to its long range and high bandwidth
transmission, IEEE 802.16 is also considered in areas where it can serve as
the backbone network with long separation among the infrastructure nodes.
Cellular technology using VoIP over WiMAX is another promising area.
WiMAX supports different types of traffics like Unsolicited Grant Service
(UGS), rtPS (real-time Polling Service), ertPS (extended real-time Polling
Service), nrtPS (non-real-time Polling Service) and Best Effort (BE).

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 20


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Unsolicited Grant Service (UGS): Specifically designed for
Constant Bit Rate (CBR) services such as T1/E1 emulation and
VoIP without silence suppression.
Extended Real-Time Polling Service (ertPS): Built on the
efficiency of both the UGS and rtPS. This is suitable for applications
such as VoIP with silence suppression.
Real-Time Polling Service (rtPS): Designed for real-time services
that generate variable size data packets on periodic basis such as
MPEG video.
Non-Real-Time Polling Service (nrtPS): Designed for delay
tolerant services that generate variable size data packets on a regular
basis.
Best Effort (BE) Service: Designed for applications without any
QoS requirements such as HTTP service.
One of the main challenges in QoS provisioning is the effective mapping of
the QoS requirements of potential applications across different wireless
platforms [3].
1.1.1 Physical Layer
Orthogonal Frequency Division Multiplexing (OFDM) in the PHY layer
enables multiple accesses by assigning a subset of Subcarriers to users. This
resembles Code Division Multiple Access (CDMA) spread spectrum that
provides different QoS to each user. OFDM is achieved by multiplexing on
the users data streams on both Uplink (UL) and Downlink (DL)
transmissions. The IEEE 802.16e Standard specifies the OFDMA based
PHY layer that has distinct features like flexible Subchannelization,
Adaptive Modulation and Coding (AMC), Space-time coding, Spatial
multiplexing, Dynamic Packet Switch based air interface and flexible
network deployment such as Fractional frequency reuse [7]. AMC
employed in the PHY layer dynamically adapts the modulation and coding
scheme to the channel conditions so as to achieve the highest spectral
efficiency at all times [8].
1.1.2 MAC Layer
The 802.16 MAC is designed to support a Point-to-Multipoint (PMP)
architecture with a central Base Station (BS) communicating simultaneously
with multiple Mobile Subscriber Stations (MSSs). The MAC includes the
following Sublayers namely:
Service Specific Convergence Sublayer (CS)- It maps the service
data units to the appropriate MAC connections, preserves or enables
QoS and bandwidth allocation.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 21


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Common Part Sublayer (CPS)- It provides a mechanism for
requesting bandwidth, associating QoS and traffic parameters,
transporting and routing data to the appropriate convergence
Sublayer.
Privacy Sublayer - It provides authentication of network access and
assists in connection establishment [9].
1.2 IEEE 802.11 - WiFi
WLAN (or WiFi) is an open-standard technology that enables wireless
connectivity between equipments and Local Area Networks (LANs). Public
access WLAN services are designed to deliver LAN services over short
distances. Coverage extends over a 50 to 150 meter radius of the Access
Point (AP). Connection speeds range from 1.6 Mbps to 11 Mbps which is
comparable to fixed Digital Subscriber Line (DSL) transmission speed
[4].New standards promise to increase speeds upto 54 Mbps. Todays
WLANs run in the unlicensed 2.4 GHz and 5 GHz radio spectrums [5]. The
2.4 GHz frequency is already jam-packed - it is used for several purposes
besides WLAN service. The 5 GHz spectrum is a much larger bandwidth
providing higher speeds, greater reliability, and better throughput [6].
1.3 HANDOVER
Handover is the process of transferring an ongoing call or data session from
one channel connected to the core network to another. The WiMAX
technology specifies a variety of handover schemes to transfer a call or data
from the control of one network to another. When a MSS moves from one
BS to another, the control information is transferred from the BS to which
the MSS is currently linked referred to as the home Base Station (hBS) to
the BS under the range of which the MSS is to be connected referred to as
target Base Station (tBS).
Handover is of two types based on the technology of the networks involved
namely, Horizontal Handover and Vertical Handover. Figure. 1 illustrates
the WiMAX - WiFi network architecture where the MSS is handed over to
the optimal nearby BS or AP. The handovers based on access networks
include:
Horizontal Handover-The mobile user switches between networks
with the same technology.
Vertical Handover (VHO) -The users switch among networks with
different technologies, for example, between an IEEE 802.11 AP and
a cellular network BS. In heterogeneous networks, VHO is mainly
used. Users can move between different access networks. They
benefit from different network characteristics (coverage, bandwidth,
frequency of operation, data rate, latency, power consumption, cost,
etc.) that cannot be compared directly [10].

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 22


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 1. WiMAX - WiFi Network Architecture


2. RELATED WORK
A link reward function and a signaling cost function are presented in [11] to
capture the tradeoff between the network resources utilized by the
connection and the signaling and processing load acquired on the network.
A stationary deterministic policy is obtained when the connection
termination time is geometrically distributed.
A novel optimization utility is presented in [12] to assimilate the QoS
dynamics of the available networks along with heterogeneous attributes of
each user. The joint network and user selection is modelled by an
evolutionary game theoretical approach and replicator dynamics is figured
out to pursue an optimal stable solution by combining both self-control of
users preferences and self-adjustment of networks parameters.
A survey on fundamental aspects of network selection process is discussed
in [13]. It deals with network selection to the always best connected and
served paradigm in heterogeneous wireless environment as a perspective
approach.
A mechanism [14] based on a unique decision process that uses
compensatory and non-compensatory multi-attribute decision making
algorithms is proposed, which jointly assists the terminal in selecting the top
candidate network.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 23


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
A cross layer architectural framework for network and channel selection in a
Heterogeneous Cognitive Wireless Network (HCWN) is proposed in [15]. A
novel probabilistic model for channel classification based on its adjacent
channels occupancy within the spectrum of an operating network is also
introduced. Further, a modified Hungarian algorithm is implemented for
channel and network selection among secondary users.
In [16], a Satisfaction Degree Function (SDF) is proposed to evaluate the
available networks and find the one that can satisfy the mobile user. This
function not only considers the specific network conditions (e.g. bandwidth)
but also the user defined policies and dynamic requirements of active
applications.
In [17], a two-step vertical handoff decision algorithm based on dynamic
weight compensation is proposed. It adopts a filtering mechanism to reduce
the system cost. It improves the conventional algorithm by dynamic weight
compensation and consistency adjustment.
A speed-adaptive system discovery scheme suggested in [18] for execution
before vertical handoff decision improves the update rate of the candidate
network set. A vertical handoff decision algorithm based on fuzzy logic
with a pre-handoff decision method which reduces unnecessary handoffs,
balancing the whole network resources and decreasing the probability of call
blocking and dropping is also added.
In [19], the authors present a multi-criteria vertical handoff decision
algorithm for heterogeneous wireless networks based on fuzzy extension of
TOPSIS. It is used to prioritize all the available networks within the
coverage of the mobile user. It achieves seamless mobility while
maximizing end-users' satisfaction.
A network selection mechanism based on two Multi Attribute Decision
Making (MADM) methods namely Multiple - Analytic Hierarchy Process
(M-AHP) and Grey Relational Analysis (GRA) method is proposed in [20].
M-AHP is used to weigh each criterion and GRA is used to rank the
alternatives.
A context-aware service adaptation mechanism is presented for ubiquitous
network which relies on user-to-object, space-time interaction patterns
which helps to perform service adaptation [21]. Similar Users based Service
Adaptation algorithm (SUSA) is proposed which combines both Entropy
theory and Fuzzy AHP algorithm (FAHP).
Load balancing algorithm based on AHP proposed in [22] helps the
heterogeneous WLAN/UMTS network to provide better service to high
priority users without decreasing system revenue.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 24


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
3. CROSS LAYER DESIGN
To ensure seamless QoS, a Cross-Layered Framework is designed for
network selection in heterogeneous environments. The PHY layer, MAC
(L2) layer and the Network layer ((L3) are involved. The layers are closely
coupled together (Figure 2).
TIER-1: It includes the PHY and the MAC layers. Resource
availability is determined from the MAC layer. The parameters RSSI
and SINR are taken from the PHY layer.
TIER-2: In the Network layer, network is selected for a MSS based
on the factors determined from TIER-1.

Figure 2. Cross Layer Design


4. MULTI- CRITERIA DECISION MAKING (MCDM) SHEMES
Handover decision problem deals with selecting network from candidate
networks of various service providers involving technologies with different
criteria. Network selection schemes can be categorized into two types -
Fuzzy Logic based schemes and Multiple Criteria Decision Making
(MCDM) based schemes.
Three different approaches for optimal access network selection are [23,
24]:
Network Centric - In network centric approach, the choice for
access network selection is made at the network side with the goal of
improving network operators benefit. Majority of network centric
approaches use game theory for network selection.
User Centric - In this approach, the decision is taken at the user
terminal based only on the minimization of the users cost without
considering the network load or other users. The selection of the
access network is determined by using utility, cost or profit functions

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 25


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
or by applying MCDM methods. The selection of an access network
depends on several parameters with different relative importance
such as network and application characteristics, user preferences,
service and cost.
Collaborative Approaches - In the collaborative approach,
selection of access network takes into account the profits of both the
users and the network operator. It mainly deals with the problem of
selecting a network from a set of alternatives which are categorized
in terms of their attributes.
The two processes in MCDM techniques are weighting and ranking. Most
popular classical algorithms include Simple Additive Weighting (SAW),
Technique for Order Preference by Similarity to Ideal Solution (TOPSIS),
Analytical Hierarchy Process (AHP) and Grey Relational Analysis (GRA).
In Simple Additive Weighting (SAW), the overall score of a
candidate network is determined by the weighting sum of all the
attribute values.
In Technique for Order Preference by Similarity to Ideal Solution
(TOPSIS), the chosen candidate network is one which is closest to
the ideal solution and farthest from the worst case solution.
Analytical Hierarchy Process (AHP) decomposes the network
selection problem into several subproblems and assigns a weight for
each subproblem.
Grey Relational Analysis (GRA) ranks the candidate networks and
selects the one with the highest ranking.
5. ANALYTIC HIERARCHY PROCESS (AHP)
AHP was introduced by Saaty [25] with the goal of making decisions about
complex problems by dividing them into a hierarchy of decision factors
which are simple and easy to analyze.
AHP generates a weight for each evaluation criterion according to
the decision makers pairwise comparisons of the criteria. The higher
the weight, the more important the corresponding criterion.
Next, for a fixed criterion, it assigns a score to each option according
to the decision makers pairwise comparisons of the options based
on that criterion. The higher the score, the better the performance of
the option with respect to the considered criterion.
Finally, the AHP combines the criteria weights and the options
scores thus determining a global score for each option and a
consequent ranking. The global score for a given option is the
weighted sum of the scores obtained with respect to all the criteria.
6. DYNAMIC ANALYTIC HIERARCHY PROCESS (DAHP)
In the proposed Dynamic AHP (DAHP), the weight of each criterion is
assigned dynamically based on the Received Signal Strength Indicator

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 26


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
(RSSI) and Signal to Noise Interference Ratio (SINR) values of a MSS with
respect to a BS or AP. A network with high RSSI and low SINR is given
priority. Likewise, the values of both RSSI and SINR are calculated at
regular intervals and the weights are assigned. Table 1 shows the possible
weights that are assigned to a network based on the parameter values.
Table 1: Weights Assignment based on values

RESOURCE SELECT/R
RSSI SINR
AVAILABILITY EJECT
Select (Worst
High High
Case)
High Medium Select
High Low Select
Medium High Reject
AVAILABLE
Medium Medium Select
Medium Low Select
Low High Reject
Low Medium Reject
Low Low Reject

DAHP involves the following steps:


Step 1: Determination of the objective and the decision factors:
In this step, the final objective of the problem is analyzed based on a
number of decision factors. They are further analyzed until the
problem acquires a hierarchical structure. In the lowest level, the
alternative solutions of the problem are found (Figure 3).
Step 2: Determination of the relative importance of the decision
factors with respect to the objective: In each level, decision factors
are pairwise compared according to their levels of influence with
respect to the scale in Table 1. If there are n decision factors, then
the total number of comparisons will be n (n - 1)/2. For qualitative
data such as preference, ranking and subjective opinions, it is
suggested to use a scale from 1 to 7 as shown in Table 2.
Table 2: Scale of Importance
PREFERENCE LEVELS VALUES
Equally preferred 1
Equally to moderately preferred 2
Moderately preferred 3
Moderately to strongly preferred 4
Strongly preferred 5
Strongly to very strongly preferred 6
Very strongly preferred 7

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 27


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 3. Hierarchy of criteria and alternatives


Initially, a pair-wise comparison nn matrix A[i][j] is formed, where n
is the number of evaluation criterion considered. Each entry aij of the
matrix represents the importance of the criterion relative to the jth
criterion.
If aij=1, an element is compared with itself.
If aij>1,then element i is considered to be more important than
element j.
If aij<1,then element j is considered to be more important than
element i.
1
aij = a for the rest of the values of the table.
ji

Each entry is multiplied with the respective parameter values which


increases the accuracy of the criterion weights.
The entries ajk and akj satisfies the following constraint:
ajk akj = 1 (1)
Also,ajj = 1 for all j.

Step3: Normalization and calculation of the relative weights:


Relative weight is a ratio scale that can be divided among decision
factors. The relative weights are calculated by following the steps
given below.
Each column of matrix A is summed.
Each element of the matrix is divided by the sum of its column.
The relative weights are normalized. After normalizing, the sum
of each column is one.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 28


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Normalized principle Eigen vector is obtained by finding the
average of rows after normalizing.
A priority vector is obtained which shows the relative weights
among decision factors that are compared. Normalized principle
Eigen vector gives the relative ranking of the criteria used.
For consistency, largest Eigen value (max) is obtained from the
summation product of each element of the Eigen vector and sum
of columns of matrix A.

When many pairwise comparisons are performed, some inconsistencies


typically arise. AHP incorporates an effective technique for checking the
consistency of the evaluations made by the decision maker when building
each pairwise comparison matrix involved in the process and it mainly
depends on the computation of a suitable Consistency Index (CI). The CI is
obtained by computing the scalar x as the average of the elements of the
vector whose jth element is the ratio of the jth element of the vector
A*w to the corresponding element of the vector w.
max n
CI = n1
(2)
A perfectly consistent decision maker should always yield CI=0. Small
values of inconsistency may be tolerated. RI is the Random Index, i.e. the
CI when the entries of A are completely random. The values of RI for
small problems (m 10) are shown in Table 3.
Table 3: Values for Random Index

1 2 3 4 5 6 7 8 9 10
0 0 0.58 0.9 1.12 1.24 1.32 1.41 1.45 1.49
CI
In particular, if RI 10%, the inconsistency is acceptable and a reliable
result may be expected. If the consistency ratio is greater than 10%, pairwise
comparison should be initiated from the beginning.
7. MODIFIED GREY RELATIONAL ANALYSIS (MGRA)
Grey system theory is one of the methods used to study uncertainty and is
considered superior in the mathematical analysis of systems with uncertain
information. A system with partial information is called a grey system. GRA
is a part of grey system theory which is suitable for solving problems with
complicated interrelationships between multiple factors and variables. GRA
method is widely used to solve the uncertainty problems with discrete data

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 29


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
and incomplete information. One of the sequences is defined as reference
sequence presenting the ideal solution. The grey relationship between the
reference sequence and other sequences can be determined by calculating
the Grey Relational Coefficient (GRC). MGRA involves the following
steps.
Step 1: Classifying the series of elements into three categories:
larger-the-better, smaller-the-better and nominal-the-best.
Step 2: Defining the lower, moderate or upper bounds of series
elements and normalizing the entities.
Step 3: Calculating the GRCs.
Step 4: Selecting the alternative with the largest GRC.
The upper bound (uj) is defined as
max{S1(j), S2(j), , Sn(j)} (3)
and the lower bound (lj) is calculated as
min{S1(j), S2(j), , Sn(j)},(4)
For the moderate bound (mj), the objective value between the lower and
upper bound is considered.
The absolute difference between Si(j) and lj or uj divided by the
difference between lj and uj achieves the normalization Si j
for larger or smaller, where i = 1 n.
The normalization for nominal-the-best is presented as uj for
larger-the-better, lj for smaller-the-better and mj for nominal-the-
best. They are chosen to form a reference series S0 which actually
presents the ideal situation.
The GRC is computed from
1
GRCi = k w
(5)
j=1 j S i j 1 +1

where wj is the Weight of each parameter.


The comparative series with the largest GRC is given the highest priority.
8. RESULTS AND DISCUSSION
A heterogeneous network scenario is simulated using ns2. The simulation
parameters are shown in Table4.Three different types of SLAs namely
SLA1 (High), SLA2 (Medium) and SLA3 (Low) are considered.
The most important selection criterion for SLA1 is the QoS
satisfaction degree and not the cost of service.
On the other hand, Cost criterion is more important than the degree
of perceived QoS for SLA2 and SLA3.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 30


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
When a Service Provider does not have resources or the QoS is not good,
the users are moved to a WiFi network to improve the performance.
Table 4: Simulation Parameters
PARAMETER VALUE
MAC Mac/802.16e & 802.11
Packet Size 5000
Bandwidth 1 Mbps
Queue Length 50
Routing DSDV
Simulation time 50 Sec

The Throughput (Figure 4)of the proposed DAHP is better when compared
to the existing scheme. The proposed scheme offers 1.15, 1.11 and 1.05
times more Throughput when compared to AHP for SLA1, SLA2 and SLA3
respectively.

Figure 4. Throughput
The proposed scheme offers 1.03, 1.2 and 1.1 times less cost when
compared to AHP for SLA1, SLA2 and SLA3 respectively (Figure 5).

Figure 5. Cost
The Average Delay (Figure 6) of the AHP scheme is 1.46, 1.38 and 1.2
times more than that of DAHP.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 31


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 6. Delay
The proposed scheme offers 1.26, 1.19 and 1.24 times less Average Jitter
when compared to AHP for SLA1, SLA2 and SLA3 respectively (Figure 7).

Figure 7. Jitter
Similarly, the Packet Loss Ratio (PLR) of DAHP is less when compared to
former scheme as network selection is done dynamically based on the QoS
values (Figure 8). The PLR of AHP scheme is 1.21, 1.12 and 1.13 times
more than that of DAHP.

Figure 8. Packet Loss Ratio


9. CONCLUSION
An optimal network selection scheme is proposed for heterogeneous
networks. The physical layer parameters such as Signal Strength and Noise

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 32


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Ratio are integrated. This scheme dynamically weighs every possible
candidate network for MSSs using DAHP and each is ranked by the MGRA.
The proposed network selection algorithm provides seamless connection for
the users over the heterogeneous network and enables the MSSs to forward
the calls to the optimal network without dropping it. The simulation results
reveal that the proposed network selection scheme efficiently decides the
trade-off among user preference and network condition. It offers better
Throughput involving less Cost, Delay, Jitter and PLR. In the future, the
proposed scheme can be enhanced to include more network alternatives and
selection criteria.
REFERENCES
[1] Haghani, E., Parekh, S., Calin, D., Kim, E. and Ansari, N. A quality-driven cross-
layer solution for MPEG video streaming over WiMAX networks, IEEE
Transactions on Multimedia, Vol. 11, No. 6, pp. 1140-1147, 2009.
[2] IEEE Std 802.16-2009, IEEE standard for local and metropolitan area networks,
Part 16: Air interface for broadband wireless access systems, 2009.
[3] Bo Li, Yung Qin, Chor Ping Low and Choon Lim Guee. A survey on mobile
WiMAX, IEEE communications magazine, pp. 70-75, 2007.
[4] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specification, IEEE 802.11 WG, Aug. 1999.
[5] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specification: High-Speed Physical Layer Extension in the 2.4 GHz Band, IEEE
802.11b WG, Sept. 1999.
[6] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specification: High-Speed Physical Layer in the 5 GHz Band, IEEE 802.11a WG,
Sept. 1999.
[7] Kennington, J., Olinick, E. and Rajan, D. Wireless network design - optimization
models and solution procedures, Springer, 2010.
[8] Ali-Yahiya, T., Beylot, A. and Pujolle, G. An adaptive cross-layer design for
multiservice scheduling in OFDMA based mobile WiMAX systems, Computer
Communications, Vol. 32, pp. 531-539, 2009.
[9] Eklund, C., Marks, R.B., Stanwood, K.L. and Wang, S. IEEE Standard 802.16: A
technical overview of the Wireless MAN air interface for broadband wireless
access, IEEE Communications Magazine, pp. 98-107, 2002.
[10] Nasser, N., Hasswa, A., and Hassanein, H. Handoffs in fourth generation
heterogeneous networks, IEEE Communications Magazine, Vol. 44, pp.96-103, 2006.
[11] Stevens-Navarro, E., Lin, Y. and Wong, V. W. An MDP-based vertical handoff
decision algorithm for heterogeneous wireless networks, IEEE Transactions on
Vehicular Technology, Vol. 57, No. 2, pp. 1243-1254, 2008.
[12] Pervaiz, Haris, Qiang Ni, and Charilaos C. Zarakovitis. User adaptive QoS aware
selection method for cooperative heterogeneous wireless systems: A dynamic
contextual approach, Future Generation Computer Systems, 2014.
[13] Rao, K. R., Zoran S. Bojkovic, and Bojan M. Bakmaz. Network selection in
heterogeneous environment: A step toward always best connected and served, In 11th
International Conference on Telecommunication in Modern Satellite, Cable and
Broadcasting Services (TELSIKS), Vol. 1, pp. 83 - 92, 2013.
[14] Bari, F.and Leung, V. C. Automated network selection in a heterogeneous wireless
network environment, IEEE Network, Vol. 21, No. 1, pp. 34-40, 2007.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 33


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[15] Haldar, Kuheli Louha, Chittabrata Ghosh, and Dharma P. Agrawal. Dynamic
spectrum access and network selection in heterogeneous cognitive wireless
networks, Pervasive and Mobile Computing, Vol. 9, No. 4, pp. 484 - 497, 2013.
[16] Cai, X., Chen, L., Sofia, R., & Wu, Y. Dynamic and user-centric network selection in
heterogeneous networks, In IEEE International Performance, Computing, and
Communications Conference (IPCCC), pp. 538-544, 2007.
[17] Liu, Chao, Yong Sun, Peng Yang, Zhen Liu, Haijun Zhang, and Xiangming Wen. A
two-step vertical handoff decision algorithm based on dynamic weight compensation,
In International Conference on Communications Workshops (ICC), pp. 1031 - 1035,
2013.
[18] Yang, Peng, Yong Sun, Chao Liu, Wei Li, and Xiangming Wen, A novel fuzzy logic
based vertical handoff decision algorithm for heterogeneous wireless networks,
In 16th International Symposium on Wireless Personal Multimedia Communications
(WPMC), pp. 1 - 5, 2013.
[19] Mehbodniya, Abolfazl, Faisal Kaleem, Kang K. Yen, and Fumiyuki Adachi. A novel
wireless network access selection scheme for heterogeneous multimedia traffic,
In Consumer Communications and Networking Conference (CCNC), pp. 485- 489,
2013.
[20] Lahby, Mohamed, and Abdellah Adib. Network selection mechanism by using M-
AHP/GRA for heterogeneous networks, In 6th Joint IFIP Wireless and Mobile
Networking Conference (WMNC), pp. 1-6, 2013.
[21] Chang, Jie, and Junde Song. Research on Context-Awareness Service Adaptation
Mechanism in IMS under Ubiquitous Network, In 75thVehicular Technology
Conference (VTC Spring), pp. 1-5, 2012.
[22] Song, Qingyang, Jianhua Zhuang, and Rui Wen. Load Balancing in WLAN/UMTS
Integrated Systems Using Analytic Hierarchy Process, In Recent Advances in
Computer Science and Information Engineering, Springer Berlin Heidelberg, pp. 457-
464, 2012.
[23] Hwang, C. L., and Yoon, K. Multiple attribute decision making: Methods and
applications, in A state of the art survey, New York: Springer, 1981.
[24] Meriem, K., Brigitte, K., and Guy, P. An overview of vertical handover decision
strategies in heterogeneous wireless networks, Journal of Computer, Communication,
Elsevier, Vol. 37, No. 10, 2008.
[25] Saaty, T. L. The analytical hierarchy process, planning, priority setting, resource
allocation, NewYork: Mcgraw Hill, 1980.

This paper may be cited as:


Priya, M. D., Prithviraj, D. and Valarmathi, M. L., 2014. DNS: Dynamic
Network Selection Scheme for Vertical Handover in Heterogeneous
Wireless Networks. International Journal of Computer Science and
Business Informatics, Vol. 13, No. 1, pp. 19-34.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 34


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Implementation of Image based


Flower Classification System
Tanvi Kulkarni
PG Student
Department of IT, SCOE, Pune

Nilesh. J. Uke
Associate Professor
Department of IT, SCOE, Pune

ABSTRACT
In todays world, automatic recognition of flowers using computer technology is of great
social benefits. Classification of flowers has various applications such as floriculture,
flower searching for patent analysis and much more. Floriculture industry consists of flower
trade, nursery and potted plants, seed and bulb production, micro propagation and
extraction of essential oil from flowers. For all the above, automation of flower
classification is very essential step. However, classifying flowers is not an easy task due to
difficulties such as deformations of petals, inter and intra class variability, illumination and
many more. The flower classification system proposed in this paper uses a novel concept of
developing visual vocabulary for simplifying the complex task of classifying flower
images. Separate vocabularies for color, shape and texture features are created and then
they are combined into final classifier. In this process firstly, an image is segmented using
grabcut method. Secondly, features are extracted using appropriate algorithms such as SIFT
descriptors for shape, HSV model for color and MR8filter bank for texture extraction.
Finally, the classification is done with multiboost classifier. Results are represented on 17
categories of flower species and seem to have efficient performance.
Keywords
MR8 filter bank, Multiboost classifier, SIFT descriptors, Visual Vocabulary, HSV color
model.

1. INTRODUCTION
Object recognition has always been a difficult problem to tackle for the
computer scientists due to the numerous challenges involved in it. It is
possible that the image of any object taken from different view appears in a
different way for each individual. Considering the natural object such as
flower, various species of flowers exists in the world. Some of the
categories are Daffodils, Buttercups, Dasils, Iris, Dandelions, Paisy,
Sunflowers, Windflowers, Lily valleys, Tulips, Tiger lilies, Crocus,
Bluebells, Cow clips etc. The categorization of flower images is challenging
due to variances in geometry, illumination and occlusions. The problem of
classification becomes more complex because of the large visual variation

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 35


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
between images of same flower species known as inter-class variability and
variation between images of different flower species called intra-class
variability. Figure1 depicts the three different kinds of flowers having
similar shape and appearance thus showing the inter-class variability.

Figure1. Flower images for inter class variability


Hence, there is a need to create a classification system that captures the
important aspects of a flower and also address issues such as variation in
illumination, occlusion, view angle, rotation and scale. This paper focuses
on proposing a system that can classify flower images by developing a
visual vocabulary that represents different distinguishing aspects of flower.
This system thus can overcome ambiguities that exist between flower
categories.
The rest paper is organized as follows: Section 2 briefs about the work done
till now related to this area. The implementation of flower classification
system using visual vocabulary is discussed in Section 3. Results of various
techniques implemented are discussed in section 4. Section 5 concludes this
paper.
2. RELATED WORKS
Many researchers have worked on the various methods and algorithms for
the flower image classification. Nilsback and Zisserman have proposed a
novel concept of visual vocabulary in order to address the issue of
ambiguity [5]. Wenjing Qi, et al. has suggested the idea of flower
classification based on local and spatial cues with help of SIFT feature
descriptors [8]. Yong Pei and Weiqun Cao has provided the application of
neural network for performing digital image processing for understanding
the features of a flower [10].Regional feature extraction method based on
shape characteristics of flower is proposed by Anxiung Hong, Zheru Chi, et
al.[7]. Salahuddin et al. have proposed an efficient segmentation method
which combines color clustering and domain knowledge for extracting
flower regions from flower images [4]. D S Guru et al. have developed an
algorithmic model for automatic flowers classification using KNN as the
classifier [3]. Nilsback and Zisserman has also computed four different
features for the flowers, each describing different aspects such as the local
shape/texture, the shape of the boundary, the overall spatial distribution of
petals, and the color. Finally they combined the features using a multiple
kernel framework with a SVM classifier [6].

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 36


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
4. METHODOLOGY
Recently, bag of visual words model [1] has gained tremendous success in
object classification. Visual Vocabulary [5] concept is based on the same
model. The most distinguishing characteristics of a flower image are the
shape, color and texture. Based on these features it becomes easy to classify
the flower images. Since the system is based on the concept of visual
vocabularies, separate vocabularies are created for color, shape and texture
features and the results are combined into final classifier. Detailed
description about the flow of the system is depicted in Figure.3.The entire
system works in two phases:-the training phase and secondly the testing
phase.

Figure 2. Block diagram of flower classification system


In training phase, all the images from all classes are selected and then their
color, shape and texture features are extracted with their respective
extraction techniques which are discussed later. The outcomes of this are the
descriptors which are provided as an input to k-means clustering algorithm
in order to form visual words. Using visual words, object histogram are
created .These histogram are given to the final multiboost classifier in order
to train them. In testing phase, when user provides the query image, firstly
feature extraction is performed then object histogram is created and given to
the classifier which with the help of trained parameters classifies the image
and provides it with the appropriate label.
The implementation of Visual vocabulary is explained below:-

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 37


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
1. SEGMENTATION-
The flower images that are taken from the dataset should be segmented first
in order to achieve the higher rate of accuracy. In this system, grabcut
method is used for segmentation and it yields good results. Grabcut is a
segmentation technique that uses region and boundary information in order
to perform segmentation. This information is gained through significant
difference between the colors of nearby pixels.

(a)Original image (b)Segmented image


Figure 3. Segmentation with grabcut method
Above figure (a) depicts the input flower image randomly selected from
dataset.Figure (b)shows the result of segmented flower image through the
grabcut method.
2. CREATING A VOCABULARY FOR FLOWER-
In order to create a flower vocabulary, we need to extract the feature
descriptors from the flower images using relevant methods and create
vocabularies of those.
A. SHAPE VOCABULARY-
Shape is the most important characteristic of flower. However, the natural
deformations of flowers and the variations of viewpoint and occlusions
change the original shape of the flower. To create rotation and scale
invariant shape descriptors, SIFT (Scale Invariant Feature Transform)
descriptors are the best method so they are extracted from flower images
which forms 128 dimensional vector. SIFT descriptors found in all training
images are clustered to create shape visual words.

Figure 4. SIFT keypoints extraction


To represent an image, a histogram is created based on the distance between
the observed SIFT descriptors [18] in the image and the computed cluster
centers. Figure 4 shows the keypoints calculated for the shape feature
extraction of a segmented flower image.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 38


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

B. COLOR VOCABULARY-
Color helps us to simplify the task of categorization. The effect of varying
illumination has an adverse effect on the measured color, which may lead to
confusion. HSV (Hue, Saturation and Value) color model hence is the most
efficient way of describing color. HSV color space is less sensitive to
illumination variations. Color visual words are created by clustering the
HSV value of each pixel in the training images. The computed cluster
centers represent the color visual words which comprises the color
vocabulary.

C.TEXTURE VOCABULARY-
Flowers can have distinctive or subtle textures on their petals. The texture is
described by convolving the images with filters from an MR8 (Maximum
Response) filter bank which is rotational invariant.MR filter bank generally
contains 38 filters. An MR8 filter consists of an edge and a bar filter at six
orientations and three scales, and two rotationally symmetric filters.

Figure 5. Convolving images with the MR8 filters


The 38 responses are summarized into eight maximum responses (three
scales for edge and bar filters, one each for Gaussian and Laplacian of
Gaussian).Figure 6 describes the results after convolving segmented image
with the MR8 filter bank.

D.COMBINED VOCABULARY-
The discriminative power of color, shape and texture varies for different
flower species. Some flowers can be more easily distinguished by their
shape, color and texture. However, it is better that, flowers are distinguished
by combination of these aspects. In order to distinguish a flower by these 3
aspects, they are combined in the classification system. They are combined
by assigning weights to their separate classification and not averaging them.
The Multiboost classifier [2] is used as it reduces variance and is less
sensitive to noise. Multiboost is an implementation of an extension of the
multi-class Adaboost algorithm.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 39


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
5. RESULTS
Considering the overall flower classification system,some of the
implementation results are discussed below.Firstly, when an input image is
selected for the categorisation purpose it is necessary that the image is
segmented.Following figure depicts the result shown by grabcut
segmentation method.

(a) (b) (c)


Figure 6. Segmentatation with grabcut method
Figure.(a) shows an input image randomly selected from
database.Fig.(b)shows segmented image through the grabcut technique
wherein background part is represented by black pixels and foreground part
by white pixels.Finally the white pixels are replaced by original color pixels
which is shown in fig.(c).It is the final segmented image of flower which is
to be used for further processing.
After segmentation,next step is feature extraction.First is the shape feature
extraction done through SIFT descriptors.Below figure descibes how
keypoints are calculated and stored.

Figure 7. SIFT keypoints detection


For the above flower image the numbers of keypoints calculated are: 65.
HSV color model is used for color feature extraction. The figure given
below is the HSV representation of original segmented flower image.

Figure 8. HSV color map

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 40


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Finally, the texture feature is extracted by MR8 filter bank. The result after
convolving a segmented flower image with MR8 filters is described in
below figure.

Figure 9. Result of convolving image with MR8 filter bank


After the feature extraction process, bag of visual words will be created by
k-means clustering. Based on visual words histograms will be created and
provided to multiboost algorithm for training and then finally testing will be
performed through the query image from the user.
Considering single feature, classification does not prove to be as efficient as
by combining the three features together.12 images are considered as
training images and 3 images are taken for testing purpose. Below shows the
classification of flowers based on single feature. Whole data set is divided
into training and testing set for better classification purpose.
1. Classification based on Color feature-
It is sometimes not possible to classify the flower image just on the basis of
color .It is possible that two flowers have same color. For instance say,
daffodils and dandelions have same color yellow. For our classification
system when LilyValley was given as a query image the classified image
was of Snowdrop just purely based on white color.

Figure 10. LilyValley classified as Snowdrop based on white color


2. Classification based on Shape feature-
Shape helps to narrow down the flower species. Given a test image of
daffodils it was classified as daffodils only.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 41


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 11. Daffodils classified as Daffodils based on shape


3. Classification based on Texture feature-
Texture feature helps to improve the classification efficiency of a flower
image. When LilyValley was given as input result was the Snowdrop based
on the pattern.

Figure 12. LilyValley classified as Snowdrop based on texture


4. Classification based on combined feature-
Since it is not sufficient to classify flower images based on single feature
only, categorization based on combined features helps to improve the
performance of classification.

Figure 13. Daffodils classified as Daffodils based on combined


(Color+Shape+Texture) features
If we consider the classification based on individual features, accuracy for
each is described in the following graph. Highest accuracy of shape feature
is achieved of 77.27% with 25 folds.
Color feature achieves the accuracy of 85.50% with 20 folds. Texture
feature achieves the highest accuracy with 25 folds of 72.29%.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 42


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 14. Performace analysis of Shape, Color and Texture features

Considering the low efficiency of classification based on only the individual


features, combined features with multiboost classifier provides the best
results. Performance accuracy of85.98% is achieved with the combined
features.

Figure 15. Performace analysis of Combined (Shape, Color and Texture) features

6. CONCLUSION
Flower classification is slowly becoming the popular area owing to its
importance for botanists and in floriculture. Flower classification system
which is discussed in this paper will provide efficient classification accuracy
owing to the idea of visual vocabulary. Developing and combining
vocabularies for several aspects (color, shape and texture) of a flower image
boost the performance significantly. Moreover the final classifier adds to the
superiority of the performance. Thus, the tedious task of classifying various
flower images into appropriate categories is simplified in effective manner.
Performance analysis shows that combining features into final classifier
boosts the performance of flower classification rather than classifying based
on individual features.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 43


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
REFERENCES
[1] Csurka, Gabriella, et al.Visual categorization with bags of keypoints. Workshop on
statistical learning in computer vision, ECCV. Vol. 1. 2004.
[2] Freund, Y., Schapire, R.A decision-theoretic generalization of on-line learning and an
application to boosting.EuroCOLT95 Proceedings of the Second European
Conference on Computational Learning Theory, pp. 23-37, 1995.
[3] Guru, D. S., Y. H. Sharath, and S. Manjunath. Texture features and KNN in
classification of flower images.IJCA, Special Issue on RTIPPR (1) (2010): 21-29,
2010.
[4] Hong, Anxiang, et al. Region-of-Interest based flower images retrieval. Acoustics,
Speech, and Signal Processing.2003 Proceedings. (ICASSP'03) IEEE International
Conference on. Vol. 3, 2003.
[5] Nilsback and Andrew Zisserman.A Visual Vocabulary for Flower Classification.
Computer Vision and Pattern Recognition, IEEE Computer Society Conference on.
Vol.2, 2006.
[6] Nilsback, M-E., and Andrew Zisserman. Automated flower classification over a large
number of classes. Computer Vision, Graphics & Image Processing, 2008.
[7] Pei, Yong, and Weiqun Cao. A method for regional feature extraction of flower
images.Intelligent Control and Information Processing (ICICIP), IEEE, 2010.
[8] Qi, Wenjing, Xue Liu, and Jing Zhao. Flower classification based on local and spatial
visual cues. Computer Science and Automation Engineering (CSAE), Vol. 3, 2012.
[9] Rassem, Taha H., and Bee Ee Khoo.Object class recognition using combination of
color SIFT descriptors.Imaging Systems and Techniques (IST), IEEE, 2011.
[10] Siraj, Fadzilah, Muhammad Ashraq Salahuddin, and Shahrul Azmi Mohd
Yusof.Digital Image Classification for Malaysian Blooming Flower. Computational
Intelligence, Modelling and Simulation (CIMSiM), IEEE, 2010.
[11] Saitoh, Takeshi, Kimiya Aoki, and Toyohisa Kaneko. Automatic recognition of
blooming flowers. Pattern Recognition, Vol. 1, 2004.

This paper may be cited as:


Kulkarni, T. and Uke, N. J., 2014. Implementation of Image based Flower
Classification System. International Journal of Computer Science and
Business Informatics, Vol. 13, No. 1, pp. 35-44.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 44


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

A Survey on Knowledge Analytics


of Text from Social Media
Dr. J. Akilandeswari
Professor and Head,
Department of Information Technology
Sona College of Technology,
Salem, India.

K. Rajalakshm
PG Scholar, Department of Information Technology
Sona College of Technology,
Salem, India.

ABSTRACT
Actionable knowledge discovery is a closed optimization problem solving process from problem
definition. It is used to extract the actionable data that are usable. Social media still contain many
comments that cannot be directly acted upon. If we could automatically filter out such noise and only
present actionable comments, decision making process will be easier. Automatically extracting
actionable knowledge from on line social media has been attracted a growing interest from both
academia and the industry. This paper gives a study in the systems and methods available text from
the social media like twitter or Facebook.

Keywords
knowledge discovery, social networking, classification.

1. INTRODUCTION
Social networking becomes one of the most important parts of our daily life.
It enables us to communicate with a lot of people. Social networking is
created to assist in online networking. These social sites are generally
communities created to support a common idea. Data mining is the process
of discovering actionable information from large sets of data. Actionable
knowledge discovery from user-generated content is a commodity much
sought after by industry and market research. The value of user-generated
content varies significantly from excellence to abuse. As the availability of
such content increases, identifying high-quality content in social sites based
on user contributions is very difficult. Social media sites become
increasingly important. In general social media demonstrate a rich variety of
information sources. In addition to the content itself, there is a large array of
non-content information obtainable in these sites, such as links between
items and unambiguous quality ratings from members of the community.
We argue that to achieve the goal we must gain a better understanding of
what actionable knowledge is, where it can be found and what kind of

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 45


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
language structures it contains. The aim of this work is to do so by
analyzing actionable knowledge in on-line social media conversation.
2. Related works
Maria Angela et al., [2] has proposed understanding Actionable knowledge
in social media BBC Question time and twitter. This paper will answer the
following questions: What is actionable knowledge, whether it can be
measured and where can we find for gaining better understanding of
actionable knowledge in twitter? There are three types of tweets: closed, re-
tweet, open. Actionable tweets can found in any of these categories. Three
steps are involved; 1) manually annotate the three subsets with action ability
scores. 2) Test the hypotheses by performing statistical annotated data. 3)
Use the W Matrix to automatically identify the language patterns in
actionable data. The method used in this paper prepares two sets Seta
containing actionable data and sets containing non actionable data. The two
sets of data are then loaded into the W matrix.

Eugene Agichtein et al., [3] have proposed to automatically asses the quality
of questions and answers provided by the user of the system. They take the
test case as Yahoo! Answers. They introduce the general classification
framework for combine the substantiation from different sources of
information, which can be adjusted automatically for a given social media
type and quality definition. Sub problem of quality evaluation is an essential
module for performing more advanced information retrieval tasks on the
question/answering. The interactions of users are organized around
questions like 1) asking a question 2) answering a question 3) selecting best
answers 4) voting on an answer.
Models:
Intrinsic content quality: The content quality of each item. This is
mostly used text related.
Punctuation and typos
Syntactic and semantic complexity
Grammatically
Usage statistics: Clicks on the item.

Modeling content quality in community Question/Answering:


Application-specific user relationships:
The dataset, viewed as a graph, contains multiple types of nodes and
multiple types of interactions

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 46


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig 2.1 Partial Entity-Relationship Diagram for Answers

The relationships between questions, user asking and answering


questions, and answers can be captured by a tripartite graph outlined in the
figure where an edge represents an explicit relationship between the
different node types. Since a user is not allowed to answer his/her own
questions.

Fig 2.2 Interaction of user-questions-answers modeled as a Tri-partiate Graph.

The types of features on the question sub tree:


Q represents features from the question being answered.
QU represents features from the asker of the question being
answered.
QA represents features from the other answer to the same question.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 47


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig 2.3 Types of features available for inferring the quality of question.

The types of features on the user sub tree:


UA represents features from the answers of the user
UQ represents features from the question of the user
UV represents features from the votes of the user
UQA represents features from answers user received to the users
question.
U represents other user based features.

Fig 2. 4 Types features available for inferring the quality of a question


A represents feature directly from the answer received.
AU represents features from the answers from the question being answered.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 48


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Akshay et al., [4] has proposed about why we use the twitter and
understanding of microblogging usage and communities. The growth of the
twitter is increased because of microblogging. It is a new form of
communication in which users can describe their current status in short
posts distributed by instant messages. One of the microblogging platform is
twitter. This tool provides a light-weight easy form of communication that
enables user to broadcast and share the information about their activities.
Compared to regular blogging, microblogging fulfills a need for an even
faster mode communication. Frequency update is one of the differences
between the regular blogging and microblogging. On an average, a profile
blogger may update her blog once every few days, but microblogger may
post several updates in a single day. Microblogging allow only 140
characters. This paper proposes a two level of framework for user intention
detection. 1) HITS algorithm to find hub and authority network. 2)
Identification of communities within friendship wise relationships by only
considering the bidirectional links where two users regard each other as
friends. The main user intention in twitter is Daily chatter, conversation,
sharing information, reporting the news.

Swapna Gottipati and Jing Jiang [5] proposed extraction of entity-actions


from users commentary. Opinion mining process focuses on extraction of
sentiments on social, products, political and economical issues. In many
cases, users not only express their sentiments but also share their requests
ideas, and suggestions through comments. This paper defines a new
problem that is extracting entity-actionable knowledge from the users
explanation.

Example:
Government must lift diplomatic immunity of the ambassador.
Government must inform the Romanian government of what
happened immediately.
SG government wants to cooperate closely with Romania in
persecuting this case.
Hope the government helps the victims by at least paying the legal
fees.
I believe that government will help the victims for legal expenses.
The above comments are in response to the news about a car accident.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 49


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
First, all sentences consist of an action and the corresponding entity
who should take the action.
Second, users tend to express the actions in various sentence
structures and hence extracting entities and actions is desired and
challenging as well.
Third, we observe that entities in all the above sentences refer to the
same entity but expressed in various forms.
Finally, similar actions are expressed differently which drives the
need for normalizing the action.
Entity action extraction:
There are three main properties of actionable comments
The entities of actionable pairs are mostly nouns or pronouns.
The entities display the positional properties with respect to the
keyword.
Entities grammatically related to the action. Eg) verb in the phrase is
related to the subject which is an entity of the actionable comment.

Table 3.1 Sample output of actionable comments extraction and


normalization task

Entity Action
Govt Lift diplomatic immunity of the
ambassador and get him to face.
Govt Inform the Romanian government
of what happened immediately.
Govt Cooperate closely with Romania in
persecuting.
Govt Helps victims by at least paying the
legal fees.

Zengyou He et al., [6] given the survey of Data mining for actionable
knowledge Data mining consists of series of steps are:
Data selection
Data cleaning
Data transformation

Actionable refers to the mined patterns suggest profitable actions to the


decision maker. The user can do something to bring direct benefits. The
benefits are:

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 50


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Increase in profits
Reduction in cost
Improvement in efficiency.

There are two frameworks used for mining the actionable knowledge.
Loosely coupled framework :

Fig 3.7 The general procedure to go from data mining task to actionable knowledge in
loosely coupled framework.
Advantages of loosely coupled framework: Flexibility, Independencies on
application.
Tightly coupled framework is better than the loosely coupled framework in
finding actionable knowledge to maximize profits

Fig 3.8. The general procedure to go from data mining task to actionable knowledge in
tightly coupled framework.

Disadvantages of tightly coupled framework: It is strongly dependent on the


application domain, The new formulated problem is usually very complex.

Killan Thiel et al., [7] has proposed the predictive analytic techniques and
text mining the Slashdot data. Predictive analytic technique used on social
media enables the user to start generate new fact based approaching on the
social media data. Text mining has been used to do sentiment analysis on
social media data. Sentiment analysis takes the written content and
translated it into different contexts, such as positive and negative. Sentiment
analysis depends on an suitable subjectivity lexicon that understands the
virtual positive, neutral or negative perspective of word expression.
Example: I find PRODUCTX to be good and useful, but it is too expensive

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 51


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
The term (and therefore the PRODUCTX) is rate as positive, since there are
two positive expressions good and useful and one negative word
expensive. In addition, one of the positive word is improved with the
word very while the negative word is place into perspective by the
qualifier a bit. The more highly developed lexica, the more detailed
analysis and the findings can be. Sentiment analysis using text mining can
be very powerful and is a well-establish, stand-alone predictive analytic
technique.
In a first step, identify negative and positive users, that which paper is to
establish whether the known (not anonymous) users express predominantly
positive or negative opinions, feelings, attitudes, or sentiments in their
comments. The polarity is used to categorize sentiment lexicon containing
texts (clues). The polarity of a text specifies whether the word seems to
evoke positive or negative. probable polarity values are: positive, negative,
both, and neutral. KNIME is used to choose the lexicon.

Danah boyd et al., [8] examines the practice of retweeting. The goal of this
paper is to describe and map out the various convention of retweeting to
provide the framework for examining retweeting practices. Retweeting
practices are:
How people retweet: Twitter gives the option to retweet through a simple
click, or you can type "RT" followed by a space and the "@username" of
the original author. Retweeting is used to share the valuable content, build
the relationship, Retweeting information regularly is to enhance our
reputation and establish our authority.
Why people retweet: To increase or spread tweets to new audience.
To entertain a particular audience.
To comment someones tweet by retweeting.
To make ones presence of listener visible.
To publicly agree with somebody.
To certify other thoughts.
To identify less popular people or less visible content.
What people retweet: retweet for others and retweet for social action.

Pritam Gundecha and Huan Liu [9] had given brief introduction about
mining the social media. Mining social media has its potential. To extract

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 52


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
actionable pattern that can be beneficial for users, business, and consumers.
Social media data are noisy, vast, unstructured, and thus novel challenges
arise. Data mining of social media can expand researchers' capability of
under-standing new phenomena due to the use of social media and improve
business intelligence to provide better services and develop innovative
opportunities. For example, data mining technique can help identify the
influential people in the vast blogosphere, detect hidden groups in a social
networking site.
Issues in mining the social media are:
Community analysis: A community is formed by individuals such that those
groups interact with each other more frequently than with those outside the
group. Communities classified into two groups implicit groups explicit
groups. Explicit groups are formed by user subscriptions, and implicit
groups are formed by user interaction. Community analysis faced with
issues such as community detection, formation and evaluation. The main
challenges in community detection are 1) the definition of a community can
be subjective. 2) the lack of ground truth makes community evaluation
difficult.
Sentiment analysis and opinion mining: Sentiment analysis and opinion
mining is used to automatically extract opinions expressed in the user-
generated content. Opinion mining and sentiment analysis tools allow
businesses to understand brand perception, product sentiments, new product
perception, and reputation management. Sentiment analysis is hard because
languages used to create contents are ambiguous. the steps of sentiment
analysis are:
Finding the relevant sections,
Finding relevant documents,
Finding overall sentiment,
Quantifying the sentiment,
Aggregating all sentiments to form an overview.
They proposed the tool twitter tracker is a twitter based analytic and
visualization tool. The focus of the tool is to help HADR relief organization
to acquire situation awareness during disasters and emergencies to aid
disaster relief efforts. New social media platforms such as twitter
microblogs, demonstrate their value and capability to provide the
information that is not attainable in traditional media. Twitter tracker is
designed to track, analyze, and monitor tweets.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 53


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Yanchang Zhao et al., [10] have proposed a Combined Pattern Mining
method. Association mining produces great collections of association rules
that are hard to understand and put into action.
Combined Association Rule: Assume that there are k datasets Di (i =
1..k). Assume Ii to be the set of all items in datasets Di and i j, Ii Ij
= . A combined association rule R is in the form of A1 A2 ... Ak
T, where Ai Ii (i = 1...k) is an itemset in dataset Di, T is a target item
or class and i, j, i j, Ai , Aj . For example, A1 is a demographic
itemset, A2 is a transactional itemset on marketing campaign, A3 is a an
itemset from a third-party dataset, and T can be the loyalty level of a
customer. The combined association rules are then further organized into
rule pairs by putting similar but contrasting rules together.
Combined Rule Pair: Assume that R1 and R2 are two combined rules and
that their left sides can be split into two parts, U and V , where U and V are
respectively itemsets from IU and IV (I = {Ii}, IU I, IV I, IU , IV
and IU IV = ). If R1 and R2 share a same U but have different V and
different right sides, then they build a combined rule pair P as
R1 : U V1 T1
R2 : U V2 T2

where U , V1 _= , V2 , T1 , T2 , U V1 = , UV2 = ,
V1V2 = and T1 T2 = .

A combined rule pair is composed of two contrasting rules, which suggests


that for customers with same characteristics U, different policies/campaigns,
V1 and V2, can result in different outcomes T1 and T2. Based on a
combined rule pair, related combined rules can be organized into a cluster to
supplement more information to the rule pair.

Mario Cataldi et al., [11] has recognized primary role of Twitter and
propose a novel topic detection technique that permits to retrieve in real-
time the most emergent topics expressed by the community. First they select
the emerging terms from that select the emerging topics. They propose two
techniques that are a supervised and an unsupervised technique to select a
limited set of relevant terms that emerge in the considered time interval. In
supervised selection they introduce critical drop value that allows the user to
decide when a term is emergent. In particular, the critical drop is defined as


(
= .

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 54


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

where 1. It permits to set the critical drop by also taking into account the
average energy value. Therefore, we define the set of emerging keyword
EKt as,

, >
The second approach considers a completely automatic model that does not
involve any user interaction. Unsupervised ranking model dynamically sets
the critical drop. This cut-off is adaptively computed as follows.
1. First ranks the keywords in descending order of energy
Value calculated.
2. Computes the maximum drop in match and identifies the corresponding
drop point.
3. Computes the average drop (between consecutive entities) for all those
keywords that are ranked before the identified maximum drop point. The
first drop which is higher than the computed average drop is called the
critical drop.
We can define the topic as a minimal set of terms semantically related to an
emerging keyword. Thus, in order to retrieve the emerging topics, we
consider the entire set of tweets generated by the users within the time
frame. for example, the keyword victory in a given set of tweets: this term
alone does not permit to express the related topic. In fact, considering as a
time frame November 2008, the related topic can be easily defined by the
association with other keywords (among the most used) as elections,
Usa, Obama and McCain. By using correlation vector we can identify
the topics related to the emerging terms retrieved during the considered time
interval. Topic graph and topic detection and ranking algorithm is used to
find emerging topic.

Fig 3.10 A Topic graph with two Strongly Connected

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 55


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Components representing two different emerging topics: labels in bold
represent emerging keywords while the thickness of an edge represents the
semantical relationship between the considered keywords.

Pirooz Shamsinejad and Mohamad sararee [12] has proposed a Bayesian


network approach. It used for causal action rule mining. Action rule is a
new method in this research area. It suggest some actions to user to get a
profit in his/her domain. classification rules, Decision trees and association
rules are already used for action rule mining. But these are all not used for
the causal relationship. Bayesian network shows the causal relationship
between variables of interest for extracting action rule. Causal relationship
is one variable causes a change in another variable.
Action rule Discovery using Bayesian network: This paper proposed
action discovery method based on Bayesian network. There are two phases
1) Modeling phase: It takes data about the instances as input and then
creates a Bayesian Network (BN) for modeling causality relationship
between attributes of instances in database. 2) Discovering phase method: It
takes each time an instance and generates the highest profitable actions for
that case.

In modeling phase there are three steps:


1. Specify the set of relevant attributes and their values. It means defining
the domain of problem.
2. Construct the structure of BN by connecting each pair of attributes for
which there is a cause and effect relationship between them. The resulting
structure is a Directed Acyclic Graph (DAG) whose nodes are attributes and
each causal relationship is shown by an edge starting from cause node
pointing to effect node
3. Learning the parameter which means finding the values of conditional
probabilities for each attribute. It is the quantitative part of the learning
process.
Discovery phase:
Step1: Find the candidate action rules for the instance.
Step2: Estimate the power of each action rule that to be
Change the goal attribute
Step3: Ranking the action rules based on their power and
Selecting the most profitable ones.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 56


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig 3.5 Action rule Discovery system based on Bayesian networks

Courteney Honeycutt and Susan C.Herring [13] has given the detail about
conversation and collaboration via twitter. The microblogging twitter is in
the process of being appropriated for conversational interaction and is
starting to be used for collaboration. The goal of this paper is to collect all
tweets posted to the public timeline. Twitter scraper is one of the tool is
available for public use. It is used to collect the data. In this paper twitter
scraper is used collect the conversation in twitter. All the tweets are not
collected because of two reasons. 1) Twitter scraper was only able to collect
up to 20 tweets per operation. 2) During period of heavy activity twitter
scraper takes long time to gather data or return error messages.
Longbing Cao and Chengqi [14] propose a practical perspective, referred to
as a domain-driven in-depth pattern discovery (DDID-PD). It corresponds a
domain driven view of of discovering knowledge satisfying real business
needs. It includes constraint mining, human cooperated mining, in-depth
mining, and loop closed mining.

Fig 3.9 DDID-PD process model

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 57


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Constraint mining: Several types of constraints are listed, which play the
important roles in a process efficiently discovering knowledge actionable to
business.
In depth mining: In in-depth mining, attentions should be paid to business
requirements, domain knowledge, objectives, and qualitative intellect of
domain expert for their impact on mining deep pattern. This might be done
through select and adding up business features, consider domain and
background data in modeling, supporting interact with fine tuning
parameters domain experts, and data set by domain experts, parameters,
optimizing models and adding factors into technical interesting measures or
building business measures, improving those result evaluation mechanism
through embedding human involvement and domain knowledge, etc.
Human cooperated mining: In DDID-PD the role of human could be
personified in the full period of data mining from business and data
accepting problem definition, data assimilation and sampling, hypothesis
proposal, feature selection, business modeling and learning to the evaluation
and resulting outcomes.
Loop-closed mining: it encloses iterative feedback to varying stages such
as sampling, hypothesis, feature selection, modeling, evaluation and
interpretation in a human-involved manner.

Janaina et al., [15] has proposed dengue surveillance based on


computational model of spatio-temporal locality of twitter. how much,
where and when dengue incidence happened, but also an additional
dimension enabled by social media, which is how the population faces the
epidemics. Then introduce an active surveillance framework that analyzes
how social media reacts epidemics based on four dimensions: volume, time,
location, and public perception.

Aron Culotta [16] has proposed detecting influenza epidemics by analyzing


twitter messages. He analyzed messages posted on the microblogging site
twitter.com. He proposed several methods to identify influenza related
messages and compare a number of regression models to correlate these
messages with CDC statistics.
Modeling Influenza rates:
He considered P be the true proportion of the population exhibiting ILI
symptoms. And W= {w1..wk}be a set of k keywords. He considered D as
a document collection and Dw as the set of documents in D that contain at

least one keyword in W. He defined Q(W,D) = was the fraction of

documents D that match W.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 58


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Regression Models:
Simple Linear Regression:
He considered a simple linear model between the log-odds of P and
Q(W,D):
Logit(P) = 1 logit(Q(W,D)) + 2 +

With coefficient 1 , 2 , error term and logit function logit(X) = ln(1 ).

Multiple Linear Regression :


W contained more than one keyword, it is natural to considered expanding
Simple Linear Regression to include separate parameters for each element
of W. The results of multiple Regression model:
Logit(P) = 1 logit (Q({w1},D)) ++ logit(Q{wk}, D)) + +1 +
where W.
Keyword Selection:
Keyword was selected by using correlation coefficient and Residual Sum of
Squares.
Key word Generation:
He proposed two methods to generate the keywords.
Hand chosen keywords:
He considered a simple set of four keywords consisting of {flu , cough , sore
throat , headache}.
Most frequent keywords:
To expand this candidate set, he searched for all documents containing any
of the hand-chosen keywords. Then he found 50,000 most frequently
occurred words in the resulting set.

4. CONCLUSION
The paper presented techniques which analyze actionable knowledge in on-
line social media conversation. We clarified the notion of actionability by
considering the actionability of those expressions (or tweets) that contain a
request or a suggestion that can be acted upon. We have identified
actionability in dengue related tweets. For example, lets examine the
following two tweets: A. protect your children from dengue. B. dengue is a
disease. Both tweets contain an opinion about the same topic (e.g.
society/social innovation). However, tweet A suggests a clear action. We
argue that tweet A says something than tweet B, in that A contains

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 59


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
actionable knowledge and B does not. We believe that it is important to
understand not only how people feel about a topic but also what actions they
would like to take. To infer actions from the tweets, soft computing
techniques are to be employed with the training keyword set.

5. REFERENCES
[1] Zhao, D. and Rosson, M.B, 2009. How and why people Twitter: the role that micro
blogging plays in informal communication at work, In Proceedings of the Int. Conference
on Supporting Group Work (GROUP '09).
[2] Maria Angela Ferrario, Will Simm Understanding Actionable Knowledge in Social
Media, Proceedings of the Sixth International AAAI Conference on Weblogs and Social
Media, 2012.
[3] Agichtein, E., Carlos, C., Donato, D., Gionis, A. and Mishne, G, Finding high quality
content in social media, In Proceedings of The International Conference on Web Search
and Web Data Mining (WSDM '08). ACM, New York, NY, USA, 2008.
[4] Java, A., Xiaodan, S., Finin, T. and Tseng, B, Why we Twitter: Understanding
Microblogging Usage and Communities, In Proceedings of the 9th WebKDD and 1st
SNA-KDD workshop on Web mining and social network analysis (WebKDD/SNA KDD
'07). ACM, New York, NY, USA, 56 65, 2007.
[5] Swapna Got t ipat i, J ing J iang School of Information Systems, Singapore
Management University, Singapore Extracting and Normalizing Entity-Actions from
Users Comments Proceedings of COLING : Posters, pages 421430, 2012.
[6] Zengyou He,Xiafei Xu,Shengchun Deng, Data mining for Actionable Knowledge: A
Survey , research and development program of china and the IBM SUR Research Fund,
2003.
[7] Killian Thiel Killian, Tobias Ktter , Dr. Michael Berthold Michael, Dr. Rosaria Silipo
Rosaria,Phil Winters, Creating Usable Customer Intelligence from Social Media Data:
Network Analytics meets Text Mining by KNIME.com AG all rights reserved 2012.
[8] Boyd, Danah, Scott Golder, and Gilad Lotan,. Tweet, Tweet, Retweet:Conversational
Aspects of Retweeting on Twitter. HICSS-43. IEEE: Kauai, HI, January 6, 2010.
[9] Pritam Gundecha, Huan Liu Arizona State University, Tempe, Arizona 85287 Mining
Social Media: A Brief Introduction ,Tutorials in operation research informs 2012.
[10] Yanchang Zhao, Huaifeng Zhang, Longbing Cao, Chengqi Zhang, and hans
Bohlscheid, Combined Pattern Mining: From Learned Rules to actionable knowledge,
AI, LNAI 5360, pp.303-403, 2008.
[11] Cataldi, M., Di Caro, L. and Schifanella, C, Emerging Topic Detection on Twitter
Based on Temporal and Social Terms Evaluation, In Proceedings of the Tenth
International Workshop on Multimedia Data Mining (MDMKDD '10). ACM, New York,
NY, USA, Art. 4, 2010.
[12] Pirooz Shamsinejad and Mohamad Saraee, A Bayesian Network Approach for
Causal Action Rule Mining, International Journal of Machine Learning and Computing,
Vol. 1, No. 5, December 2011.
[13] Honeycutt, C. and Herring, S.C. Beyond Microblogging: Conversation and
Collaboration via Twitter, In Proceedings of 42nd Hawaii International Conference on
System Sciences. (2009) pp. 1 10, 2010.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 60


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[14] Cao, L. and Zhang, C, Domain Driven Actionable Knowledge Discovery in the Real
World, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, pp. 821 830,
2010.
[15] Janaina Gomide , Adriano veloso , Wagner Meira Jr , Virgilio Almeida , Fabrico
Benevenuto , Dengue Surveillance based on a computational model of spatio-temporal
locality of twitter
[16] Arone Culotta , Towards detecting influenza epidemics by analyzing Twitter
messages 1st workshop on social media analytics , July 25, 2010.

This paper may be cited as:


Akilandeswari, J. and Rajalakshm, K., 2014. A Survey on Knowledge
Analytics of Text from Social media. International Journal of Computer
Science and Business Informatics, Vol. 13, No. 1, pp. 45-61.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 61


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Progression of String Matching


Practices in Web Mining
A Survey
Kaladevi A. C.
Associate Professor,
Department of Computer Science and Engineering,
Sona College of Technology,
Salem, India

Nivetha S. M.
PG Scholar,
Department of Computer Science and Engineering,
Sona College of Technology,
Salem, India

ABSTRACT
String matching is the technique of finding strings that match a pattern approximately. The
problem of approximate string matching can be classified into two sub-problems namely
finding approximate substring matches inside a given string and finding dictionary strings
that match the pattern approximately. The basic technique is the Dictionary-based entity
extraction. It identifies entities from a document which are predefined. Here the value of
recall is lesser. Next trend for improving the recall is the approximate entity extraction. For
a given query it finds all substrings in a document that roughly match entities in a given
dictionary. This causes redundancy and lowers its performance. To overcome this drawback
in the performance of string matching, a technique called Approximate Membership
Localization is used. It is solved via P-Prune Algorithm. This paper is a survey on
performance and accuracy of the string matching process and exposes an idea on using P-
Prune in Blog-Search Framework.
Keywords
Blog, P-Prune, Approximate membership localization, Approximate membership
Extraction, RSS Feeds

1. INTRODUCTION
Data mining is the process of discovering interesting patterns from a large
data set. It is the automatic or semi-automatic analysis of large amount of
data for extracting previously unknown patterns such as groups of data
records, unusual records and dependencies.

Raw Data Patterns Knowledge

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 62


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
The primary data mining techniques are classification, clustering and
association rule mining. Data mining has its uses in fields such as games,
business, science and engineering, medical data mining, sensor data mining,
pattern mining, spatial data mining, knowledge grid, etc. Though there are
wider applications of data mining two critical factors exist in it namely,

Database size: Large data to be processed and maintained requires


more powerful system.
Query complexity: Highly complex and larger number of queries to
be processed requires a system with high potent.

Web mining is the application of data mining techniques to extract


knowledge from Web data. It can also be defined as a collection of
interrelated files on web server(s).In this paper we analyze various
techniques used for String Matching in Web Mining. The String matching
problem is finding all the occurrences of a given pattern in a text where they
are the sequences of characters from a finite set. If k, l, m are strings then k
is said to be prefix of kl and suffix of mk and a factor of lkm. Various
algorithms exist in solving it. Three main search approaches are the Prefix
searching, Suffix searching and Factor Searching. The extension is the
Multiple String matching technique where the length of the string is taken
into account. A simpler solution to it is repeating the searches. The most
recent algorithm for string match is the Potential Redundancy Prune (P -
Prune) algorithm applied for a Web Search based Framework. The search in
textual documents requires only a static dictionary. The major applications
of String matching are Intrusion Detection, Plagiarism, Bioinformatics,
Digital Forensics, Text Mining Research, etc.

Blogs have created a highly active part of the World Wide Web due to their
rapid growth [1]. With simple technical knowledge any person can create a
blog using popular blogging services such as Blogger, WordPress, etc.
Similar to web search the String matching algorithms can also be applied to
the Blog Search Framework. This can be made possible by collecting Really
Simple Syndication (RSS) Feeds from various Blogs. It needs a dynamic
dictionary since we ought to analyze the opinion [2] about the blogs which
are updated often. We have presented a paper [3] on this research. The
results revealed that the search over Blogs is much better when P-Prune
algorithm is applied to solve AML problem.

2. EVOLUTION OF STRING MATCHING TECHNIQUES


2.1 Bloom Filter
B. Bloom [4] proposed a filtering mechanism called Bloom Filter. It is used
in calculating the probability for an input to be a member of given

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 63


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
dictionary with allowable fraction of errors. He has explained two new hash
functions related with the conventional hash coding method. Two major
computational factors used here are Reject time and Space. In conventional
hash coding errors are not permitted. Here the hash area is split up into cells
and a pseudorandom number is generated from the messages. The messages
are then stored in the cells. The cell content is then compared with the test
message. The match indicates that it is a member. The new hash coding
methods are a slight variation from the conventional method.
In the first method the hash area is organized in the same way as that of the
conventional method but instead of storing the entire message code
generated from the message is stored. With the smaller fraction of allowable
errors the cell size increases. The codes are tested similar to the
conventional method. Due to the lack of uniqueness in codes there may arise
errors of commission. In the second method hash area is divided into
individual addressable bit. Message is hash coded to number of distinct bit
addresses.
2.2 Bloom Filter along with Password Security
U. Manber and S. Wu [5] extended the Bloom filter for exact matches to
approximate matches along with password security. The primary focus of
this paper is the prevention of password guessing. Even a small change in
input value can modify the hash value on using Bloom filter. Hence it is not
good for approximate queries. All possible variations of each dictionary
string are generated and are inserted into Bloom filters. This works well for
small d values because for large distance threshold generation of variations
becomes computationally expensive. The data structure here can be used in
applications that require fast approximate queries to large databases. The
authors have discussed two such applications here. First is spell check in
large bibliographic files. Regular spell checker is used initially and the
words that are not found in dictionary are checked for their distance. The
second application is the use of Filtering. The pattern-matching problem is
reduced by splitting them into smaller size pieces.
The two extensions to Bloom filters that permit fast approximate set
membership are as follows:
Reduction of an approximate query to several exact queries.
Effective utilization of secondary memory by directing all hashing of
the same element to same page.

2.3 Multipattern Matching technique


G. Navarro and M. Raffinot [6] presented a Multipattern Matching
technique in their book. In general, pattern matching compares a test string
within a document. Multi matching finds all occurrences of patterns from a
given set inside a document. This is an extension of Aho-Corasicks [7]

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 64


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
technique which is building a tree over all patterns. This idea significantly
reduces the number of comparisons between substrings and patterns.
Though there are advantages, the drawback of this method is that it works
only for exact matches and not for approximate matches. This is because the
construction of tree using approximate matches is not possible.
Approximate matching is modeled using a distance function. Four types of
existing algorithms are to be adapted for Approximate String Matching
technique.
First is the oldest approach that is based on Dynamic Programming. The
next approach which uses a function of pattern suits well for short patterns.
The third approach is Bit-Parallelism which retrieves most of the successful
results. Final Approach is Filtration in which the areas without match are
discarded and the remaining part is verified using some algorithm. But this
method is found to have efficiency degradation when the threshold value is
higher. The Efficiency in approximate matching can be increased by
adapting additional measures.
2.4 Cumulative Gain
K. Jarvelin and J. Kekalainen [8] presented about Cumulative gain in their
paper. It is the sum of the graded relevance values of the results in a search
result list. The Information Retrieval Techniques extended this concept for
retrieving highly relevant documents. They took into account the Graded
relevance judgments. The cumulative gain is analyzed on three measures
namely
Direct Cumulated Gain (CG): Accumulation of the relevance scores
of retrieved documents along the ranked result list.
Discounted Cumulated Gain (DCG): Applying a discount factor to
the relevance scores in order to devaluate late-retrieved documents.
Normalized (D)CG Measure (nDCG):Sorting documents of a list of
results by relevance, producing maximum possible DCG till
position.
The strength of these measures is as follows:
Relevance of the document and their rank are combined.
Irrespective of the count of documents Cumulative gain can be given
as a single measure.
Independent of outliers.
Explicitly provides the count of documents that hold good for nDCG
values.
The normalized CG and DCG measures reveal that the performance
differences are also normalized.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 65


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Though there are many advantages, there are certain limitations of these
measures such as,
Redundancy of documents is not considered.
Though the relevance is multidimensional the measures here
consider only single dimension.
Any measure based on static relevance judgments cannot handle
dynamic changes.

2.5 Fast Similarity Search (FastSS)


B. Bocek et al., [9] pioneer an algorithm called Fast Similarity Search
(FastSS) which is an exhaustive similarity search in a dictionary. This is
based on the edit distance form of string similarity which is the minimum
number of operations needed to transform one string to another. The
Approximate Dictionary Queries [10] are the basis for this search scheme.
For a dictionary containing w words of average length l with k
maximum number of spelling errors , a deletion dictionary of size O(wlk) is
used by FastSS. At the time of search each query is mutated to generate a
deletion neighborhood of size O(lk). This contributes a faster search since
the insertions and replacements are not taken into account. Various
algorithms both online and offline are considered here. Online algorithms
search without pre-processing whereas Offline algorithms perform pre-
processing and store the data in memory to speed up the process. FastSS is
an exhaustive offline search technique. Diverse algorithms have been
applied in random dictionary and the results reveal that FastSS outperforms
other algorithms.
2.6 Filtration - Verification
K. Chakrabarti et al., [11] proposed in their paper the constraint that first
and the last tokens of the extracted substring must be present in the
corresponding dictionary reference. Multiple-match redundancy deteriorates
the accuracy of the matched pairs and the efficiency of the algorithm
generating them. The filtration and verification Framework is been applied
here. The new filtering technique used here is the Inverted Signature based
Hash (ISH) table. It is similar to inverted index but it stores the list of
signatures per token i.e., the set of signatures are stored. The ISH extended
the previously existing binary signature scheme to weighted signature
scheme because it requires multiple signatures to be matched
simultaneously. All possible substrings and reference pairs passed over the
filter are verified. A performance study is made among ISH, LSH (locality -
sensitive hashing) and segmented index merging. The experimental results
based on the three computational factors namely Execution time, Memory
requirement and Filtering power conducted on both filter only and filter
verification reveals that ISH filter works faster, generates very less

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 66


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
candidate members and the memory space required is comparable with
LSH. Finally they had proved that this filter efficiently filters out a large
number of the non-member substrings.
2.7 Similarity Functions defined by Exploiting Web Search Engines
S. Chaudhuri et al., [12] proposed a method exploiting web search engines
to define new similarity functions. The entity matching task identifies entity
pairs one from a reference entity table and the other from an external entity
list. The task is to check whether or not a candidate string matches with
member of reference table. Consider another application. The entity
matching task identities entity pairs, one from a reference entity table and
the other from an external entity list, matching with each other. An example
application is the offer matching system which consolidates offers (e.g.,
listed price for products) from multiple retailers. New document-based
similarity measures are proposed to quantify the similarity in the context of
multiple documents. However, the challenge is that it is quite hard to obtain
a large number of documents containing a string unless large portion of the
web is crawled and indexed as done by search engines. A Class of
synonyms where each synonym for an entity e is an identifying set of
tokens, which when mentioned contiguously (or within a small window)
refers to e with high probability and identifying token set as IDTokenSets.
The approach is used to compute string similarity score between the
candidate and the reference strings.
Further they developed efficient techniques to assist approximate matching
in the context of certain similarity functions. In an extensive experimental
evaluation, demonstrate the accuracy and efficiency of a technique. The
drawback is that it does not match document words and the Quality of id
token set is low.
2.8 Approximate Membership extraction (AME)
Jiaheng Lu et al., [13] applied in their research, the Jaccard Similarity
Function in K-Signature scheme.
wt ( S1 S 2) (1)
J ( S1, S 2)
wt ( S1 S 2)
The Eq.1 gives the Jaccard similarity function where s1 and s2 are any two
strings whose weight is known.
For a string S and a threshold , tokens are sort in descending order and
subset of tokens in prefix signature set of s in sig(s) is chosen as,
r (s) wt (sig (s)) (1 )wt (s) 0 (2)
For efficient Approximate Membership extraction (AME) researches used a
Filtration-Verification framework.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 67


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Initially a Signature based Inverted List (SIL) is built by generating
signature for reference string based on k-Signature scheme. The authors of
this paper proposed two Algorithms namely,
EvScan: Scan Inverted List so as to avoid overlapping between
strings but consumes time in unnecessary list scanning.
EvIter: An Optimized version of EvSCAN that reduces unnecessary
list scanning.
The modified SIL answers queries with dynamic similarity threshold. The
experimental results show that these algorithms save cost in terms of
scanning the inverted list. The major drawback of AME is that it does not
analyze the redundancy which is caused by overlapping strings. This
problem is studied by Li et al., [14] in their paper
2.9 Approximate Membership Localization (AML)
Li et al., [14] proposed an efficient technique, Approximate Membership
Localization (AML) using the Potential Redundancy Prune (P-Prune)
Algorithm. It is also a dictionary based problem. This overcomes the
drawback of the Approximate Membership Extraction (AME) which poses
much redundancy that lowers the efficiency and decreases the performance
of Real World applications. The efficiency of AML is proven to be high
since it prunes the overlapped strings before generating them. The
experimentation is done on both AME and AML within a Web-Based
Framework and the results reveal that precision and recall of this method is
much higher than the other similarity metrics.
The dictionary considered here is static and is not proven for certain real
time applications such as Blogs which requires a dynamic dictionary.
2.10 Ranking of Opinions in Blogs
G. Mishne [15] analyzed ranking of opinions in Blogs. Three Components
taken into account are,

Fact-oriented information retrieval.


Dictionary-based opinion expression detection.
Spam filtering.
They utilized the blog documents and RSS feeds as additional training data.
Opinion retrieval system uses publishing dates of the documents as a feature
in the retrieval. The contents of actual blog posts and the RSS documents are
compared. Query dependent spam filters are used to further remove spam
documents.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 68


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.11 Opinion Retrieval
W. Zhang et al., [16] also made their study on a three various modules on
Opinion Retrieval. They are
Information retrieval
Opinion classification
Similarity ranking.
At first the information is retrieved from Blogs. Secondly, these documents
are classified into opinionative and non opinionative documents. Finally
ensures that the opinions are related to the query, and ranks the documents
in certain order.
The score of this system is about 13% higher than that of the system
developed by Mishne.
2.12 Statistical approach to retrieve opinionated blog posts
B. He et al. [17] presented a statistical approach to retrieve opinionated blog
posts. The system automatically generates a dictionary from the
Blogosphere without requiring manual effort. The dictionary can be derived
by removing the rare terms since they cant be generalized. Further a weight
is assigned to each term in the dictionary and also assigns an opinion score
to each document in the collection using the top weighted terms from the
dictionary as a query. Finally the opinion score is combined with the initial
relevance score produced by the retrieval baseline.
This paved the foundation for our idea of combining Viewers Opinion with
the p-Prune Algorithm.
2.13 P-Prune in Blogs
Based on these ideas we [3] have presented a paper on using P-Prune
algorithm for Blogs. The major variation of Blog Search from normal Web
Search is that in Blogs the scoring of documents can be done based on the
opinion retrieved from the viewers.
On collecting the RSS feeds of Blogs [18] we can retrieve the opinion of
viewers. Once viewers subscribe to a website they need not manually
check it. Instead, the browser persistently monitors the site and informs the
user of any updates. The new data for the user can be automatically
downloaded by commanding the browser.

3. CONCLUSION AND DIRECTIONS TO FUTURE WORK


In this paper, we have analyzed various string matching techniques. It is
formalized that the AML problem in a Blog Search Framework can be
resolved with an efficient algorithm called P-Prune in accumulation with a
Opinion Retrieval Scheme. The P-Prune algorithm is proved to be much

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 69


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
faster, than simply adapting former AME methods. The precision and recall
of blog-based join with the AML results largely outperform AME. Future
work is to apply the P-Prune algorithm for AML to scenarios, such
Vlogs. Vlog entries often combine embedded video with supporting text,
images, and other metadata. In addition to textual similarity, other similarity
measures are to be considered. The AML-based solutions are more apposite
than the AME-based solutions for the real-world applications, since the
matches of the AML are much nearer to the true matched pairs.

REFERENCES
[1] http://blogsearch.google.com
[2] B. Liu, M. Hu and J. Cheng Opinion Observer: Analyzing and Comparing Opinions,
Proceedings of the 14th WWW Conference, 2005.
[3] A.C.Kaladevi and S.M.Nivetha, Efficient Approximate Membership Localization
using P-Prune Algorithm in Blogs, in International Conference on Computer
Communication and Informatics, pp. 14, 2014.
[4] B. Bloom, Space/Time Trade-Offs in Hash Coding with Allowable Errors, Comm.
ACM, vol. 13, no. 7, pp. 422-426, 1970.
[5] U. Manber and S. Wu, An Algorithm for Approximate Membership Checking with
Application to Password Security, Information Processing Letters, vol. 50, no. 4, pp.
191-197, 1994.
[6] G. Navarro and M. Raffinot, Flexible Pattern Matching in Strings: Practical On-line
Search Algorithms for Texts and Biological Sequences. Cambridge Univ. Press, 2002.
[7] A. Aho and M. Corasick, Efficient String Matching: an Aid to Bibliographic Search,
Comm. ACM, vol. 18, no. 6, pp. 333-340, 1975.
[8] K. Jarvelin and J. Kekalainen, Cumulated Gain-Based Evaluation of IR Techniques,
ACM Transactions on Information Systems, vol. 20, no. 4, pp. 422-446, 2002.
[9] B. Bocek, E. Hunt, and B. Stiller, Fast Similarity Search in Large Dictionaries,
Technical Report ifi-2007.02, Dept. of Informatics University of Zurich, 2007.
[10] G. Brodal and L. Gasieniec, Approximate Dictionary Queries, Proceedings of the 7th
Symp. Combinatorial Pattern Matching, vol. 1075, pp. 65-74, 1996.
[11] K. Chakrabarti, S. Chaudhuri, V. Ganti, and D. Xin, An Efficient Filter for
Approximate Membership Checking, Proceedings of ACM SIGMOD International
Conf. Management of Data, pp. 805-818, 2008.
[12] S. Chaudhuri, V. Ganti, and D. Xin, Exploiting Web Search to Generate Synonyms
for Entities, Proceedings of the 18th International Conf. World Wide Web (WWW),
pp. 151-160, 2009.
[13] J. Lu, J. Han, and X. Meng, Efficient Algorithms for Approximate Member Extraction
Using Signature-Based Inverted Lists, Proc. 18th CIKM ACM Conf. Information and
Knowledge Management, pp. 315-324, 2009.
[14] Z. Li, L. Sitbon, L. Wang, X. Du and X. Zhou: AML: Efficient Approximate
Membership Localization within a Web-Based Join Framework, IEEE Transactions
on Knowledge and Data Engineering, vol. 25, no.2,Feb.2013.
[15] G. Mishne, Multiple Ranking Strategies for Opinion Retrieval in Blogs, Proceedings
of TREC Blog Track, 2006. Retrieval in Blogs, Proceedings of TREC Blog Track,
2006.
[16] W. Zhang, C. Yu, W. Meng Opinion Retrieval from Blogs, Proceedings of the
CIKM ACM International Conf. Information and Knowledge Management, pp. 831-
840, 2007.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 70


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[17] B. He, C. Macdonald, J. He, I. Ounis, An Effective Statistical Approach to Blog Post
Opinion Retrieval, Proceedings of the CIKM ACM International Conf. Information
and Knowledge Management, pp.1063-1072, 2008.
[18] J. Elsas, J. Arguello, J. Callan, J. Carbonell, Retrieval and Feedback Models for Blog
Feed Search, SIGIR ACM Conf. Special Interest Group on Information Retrieval, pp.
347354, 2008.

This paper may be cited as:


Kaladevi, A. C. and Nivetha, S. M., 2014. Progression of String Matching
Practices in Web Mining A Survey. International Journal of Computer
Science and Business Informatics, Vol. 13, No. 1, pp. 62-71.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 71


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Virtualizing the Inter Communication of Clouds


Subho Roy Chowdhury
School of Information technology and Engineering
VIT University, Vellore, India

Sambit Kumar Patel


School of Information technology and Engineering
VIT University, Vellore, India

Ankita Vinod Mandekar


School of Information technology and Engineering
VIT University, Vellore, India

G. Usha Devi
School of Information technology and Engineering
VIT University, Vellore, India

ABSTRACT
The cloud has been an attracting platform for the enterprises to deploy as well as to execute
their business. However, clouds do not possess infinite storage resources and infinite
computational in its infrastructure. If there is saturation then no new request will be processed.
Hence, Inter-Cloud networking is required to provide efficiency, flexibility and scalability by
resource sharing between other clouds. The field of Inter-Cloud communication is still a new
field although there has been some basic knowledge about it; however the working knowledge
about it is still far from reached. In this paper, the aim is to implement the Inter-Cloud
communication with the use of another uprising technology called the Software Defined
Network (SDN). SDN is a new prototype for the applications to exchange information with the
network and to query the Application Programming Interface (API) to gather network
information to plan and optimize the operations. A detailed result is presented of the
improvements in virtualizing the Inter-Cloud communication improves the overall performance
as compared to the current existing Inter-Cloud communication techniques.

Keywords
Cloud computing, Inter-Cloud, Software Defined Network (SDN), Software Defined
Infrastructure, Application Programming Interface (API).

1. INTRODUCTION
Cloud computing is a field which has created a lot of interest in the Information
and Communications Technology (ICT) community. This is making great
development in the IT industry. Cloud computing is defined as a collection of
different system which provides well managed resources inside the
organization. Providing the Platform as a Service, Software as a Service, and

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 72


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Infrastructure as a Service by the cloud provider to the organization follows the


method of pay-as-you-go [1]. From the industry point of view, it is seen as an
economical model which is used to rent the technical resources as per business
requirement. This provides the added advantage of renting resources on
demand which includes the upfront investment and licensing cost which would
have put the business in a huge financial burden and in the due course making
the business dynamic and easily adoptable to the clients requirement. The
public, private and hybrid cloud of the organization intercommunicates with the
help of middleware using the exchange server. The exchange server is used to
connect different kind of workstations along with the different clouds as in
Figure 1 a cloud co-ordinator is used connect one clouds with the exchange
server [5].
There are numerous benefits of Inter-Cloud communication for cloud client that
can be summarized as:

Figure 1. An Architecture of Inter-Cloud communication

Diverse Geographical Locations


The cloud service providers have setup information centers worldwide. On the
other hand, it is doubtful that any supplier can build information centers in each
nation and regulatory area. Numerous requisitions have administrative
prerequisites as to where information is archived. Consequently an information
focus inside a district of nations may not be sufficient, and provision designers
will require fine-grained control as to where assets are positioned. Just by using

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 73


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

numerous clouds would one be able to pick up access to so broadly circulated


assets and furnish well performing and enactment agreeable administrations to
customers.

Better application resilience


During the past several years, there have been several cases of cloud service
outages, including ones of major vendors. The implications from one of
Amazons data centers failure were very serious for customers who relied on
that location only. In a post-mortem analysis Amazon advised their clients to
design their applications to use multiple data centers for fault tolerance [2, 13].

Avoidance of vendor lock-in


By utilizing different clouds and having the ability to unreservedly travel
workload around them, a cloud customer can effectively evade source bolt in.
On the off chance that a supplier changes a strategy or valuing that effect
contrarily its customers, they could effectively move somewhere else.
Software Defined Networking is putting forth both critical tests and
unrestricted chances for what is to come in exchanging data and imparting the
same around the globe. Software-defined networking (SDN) is an approach
to computer networking which evolved from work done at UC
Berkeley and Stanford University around 2008 [3]. Indiana University in May
2011 started a SDN Interoperability Lab in conjunction with the Open
Networking Foundation to test how well diverse specialists' Software Defined
Networking and Open-flow items work together. SDN empowers clients to
system layers, dividing the information plane from the control plane. By
empowering programmability, SDN can empower clients to streamline their
system assets, increment arrange dexterity, administration development,
quicken administration opportunity to-market, remove business sagacity and
eventually empower dynamic, administration driven virtual systems.
Exploding levels of information virtualized framework and distributed
computing leave clients with extensive demands on their systems and
administration assets. To-date methodologies to virtualize the system layer and
include programmability which has left the industry divided cause of the
complexity in making the selection more adaptive. In the meantime, endeavour
clients and administration suppliers are looking for answers to empower virtual
and versatile systems to underpin virtual units that are portable and detached
from a physical system design.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 74


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Previously, the Internet Protocol (IP) network were designed by Autonomous


Systems (AS) which used the forwarding, routing, bridging algorithms for
delivering the data from source to destination. The principle of Autonomous
Systems allows the designated destination to move with changing identity for
packet delivery service. It was difficult to specify access control, quality of
service, packet sequencing with that. The new standards introduced by Internet
Engineering Task Force (IETF) such as Virtual Private Network increased the
complexity for network elements and infrastructure for cloud computing.
The Software Defined Network is playing great role in Infrastructure
orchestration. The Software Defined Network is deployed using newly
designed Infrastructure said to be Software Defined Infrastructure (SDI) as in
Figure 2. It is having two parts the Hardware section and the Software section
[6].

Figure 2. Logical structuring of Software Defined Network (SDN)

Hardware Requirements
The network protocol needs optimizers, accelerators and adapters to increase
their efficiency which result into expensive infrastructure. However, SDN
provides the interoperability, granularity, dynamism, visibility of data transfer
over the network. SDN combines the data from physical to application layers
and enlighten their usage throughout the network. SDI balances all the above
required parameters to fetch the information from the customer for
infrastructure and services. The SDI augments the cloud computing customer

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 75


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

expenses. As per requirement of cloud computing, it is very cost effective and


flexible to optimize, secure and monitor the cloud network.

Software Controlling network


The separation of data plane and control plane gives birth to new way of
managing the resources and network elements. The control plane covers
breadth and depth of IT Infrastructure. The information sent by customer is
grabbed by SDI Controller and include them to infrastructure orientation for
further service. The control plane can get the information from multiple
vendors and further provisioning done as per work flow. SDI is simple, scalable
and reliable to design and make updates to the network. The service level
agreement dashboard shows dynamism for customer use.
Cloud administration suppliers might likewise have critical motivating forces
from taking part into an Inter-Cloud activity. A fundamental thought of cloud
computing is that a cloud administration might as well convey steady
accessibility, flexibility and versatility to meet the concurred clients'
necessities. A cloud supplier might as well guarantee enough assets constantly.
The work load which needs to be done is not certain as the workload spikes can
come out of the blue and hence cloud suppliers need to overprovision assets to
meet them. An alternate issue is the immense measure of information focus
force utilization. Keeping an abundance of assets in a prepared to utilize state
constantly for adapting to surprising load spikes heads to expanded force
utilization and cost of operation. The benefits of the cloud providers can be
prcised as follows.

Expand on demand
Being able to offload to other clouds a provider can scale in terms of resources
like cloud-hosted applications do within a cloud. A cloud should maintain in a
ready to use state enough resources to meet its expected load and a buffer for
typical load deviations. If the workload increases beyond these limits, resources
from other clouds can be leased [4, 14].

Enhanced SLA to customer


Realizing that even in a most dire outcome imaginable of data center outage
additionally asset lack the approaching workload could be moved to an
alternate cloud, a cloud supplier can furnish better Service Level Agreements
(SLA) to clients.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 76


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

These benefits should help both the cloud providers as well as the clients
without violating the requirement of the applications. The fitting application
i.e., provisioning and scheduling should satisfy the requirement in terms of
performance, legal consideration and responsiveness.
In this paper, a couple of clouds are built and then inter-connection between
them is setup setting up a virtual network interconnection. The main focus on
changing the currently existing inter cloud network by the use of Software
Defined Network (SDN) which would virtualize the intercommunication
between the two clouds. Later in the paper the result of the improvement that
helps the consumer along with the service provider by increasing the QoS,
decreasing the financial part of it and making the network more flexible is
shown. In section 2, the information regarding the current works which are in
progress for Inter-Cloud communication is shown. In section 3, a discussion
about the Eucalyptus tool which is used to build the cloud for the test bed. In
section 4, the details about the work which has been performed to improve the
inter connection along with the outcome of the paper is shown. In the section 5,
the outcome of the paper is shown in the graphical format. The final section 6
accomplishes the study and provides the avenues for the future work.

2. RELATED WORKS
IEEE, the worlds largest professional organization advancing technology for
humanity, introduced the founding members of the IEEE Inter-Cloud Testbed
project. The IEEE Inter-Cloud Testbed is developing cloud-to-cloud
interoperability and federation capabilities to enable cloud services to become
as ubiquitous and as mainstream as the Internet. Results from the project will
also assist in the development of the forthcoming IEEE P2302 Standard for
Inter-Cloud Interoperability and Federation, which is developing standard
methodologies for cloud-to-cloud interworking [5].
In 2002, Wierzbicki et al. proposed Rhubarb [8]. It has only one coordinator
per group and the hierarchy could be made using this groups. The proxy
coordinator is used by system and all nodes inside the network make a
permanent TCP connection with the proxy coordinator.
In 2004, Xiang et al. proposed a Peer-to-Peer Based Multimedia Distribution
Service [9, 16]. They proposed an idea of a topology-aware overlay in which
nearby hosts self-organize to the application groups while end hosts within the

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 77


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

same group collaborate with each other to achieve Quality of Service (QoS)
awareness.
According to Vinton Cerf, a co-designer of the internets TCP/IP protocol, the
important issue with cloud computing is interoperability between current clouds
[10]. So, there is need of development of protocol and standards to interact with
multiple clouds.
David Bernstein et al. study protocols and formats for cloud computing
interoperability on [11]. A set of protocols, called Inter-Cloud protocols, and a
set of mechanisms are numbered on this paper.
There is a Global Inter-Cloud Technology Forum whose aim is to promote
standardization of network protocols and the interfaces needed for
internetworking between clouds. A white paper with use cases and functional
requirements for Inter-Cloud computing is published in [12].
The Metro Ethernet Forum (MEF) has technically working on standards cloud
computing and Software-defined networking over carrier-grade Ethernet in
2013 [16].

1. EUCALYPTUS
Eucalyptus an Opensource software structure for cloud computing that
actualizes what is ordinarily alluded to as Infrastructure as a Service (IaaS);
frameworks that give clients the capacity to run and control whole virtual
machine occurrences sent over an assortment physical connection.
Fundamental standards of the Eucalyptus plan, detail vital operational parts of
the framework, and talk over compositional exchange, that has been made to
allow Eucalyptus to be conveyable, measured and easy to use on foundation
normally discovered inside scholarly settings. At last, a furnished confirmation
that Eucalyptus empowers clients acquainted with existing Grid and HPC
frameworks to investigate new cloud computing usefulness while looking after
access to existing, recognizable provision advancement software and Grid
center.
The architecture of the Eucalyptus framework as in Figure 3 is simple, adaptive
and secluded with a hierarchical configuration reflecting normal resource
environment discovered in numerous scholastic settings. In essence, the
framework permits clients to begin, control, access, and whole virtual machines

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 78


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

utilizing an emulation of Amazon Ec2's Soap and "Query" interfaces. That is,
clients of Eucalyptus connect with the framework utilizing literally the same
apparatus and interface that they use to connect with Amazon. There are four
high-level components, each with its own Web-service interface, that comprise
a EUCALYPTUS Installation [7, 15]:

Figure 3. EUCALYPTUS employs a hierarchical design to reflect underlying


resource topologies
2. Node Controller
It is used to track the termination, execution and inspection of the Virtual
Machine instances on the host system where it runs.
3. Cluster Controller
It schedules the Virtual Machines execution with the information it gathers on a
specific node controller, it also manages virtual network instance.
4. Storage Controller (Walrus)
It is a put/get storage service implemented on Amazons S3 interface, which
provides a mechanism for virtual machine image and store data accessing and
storage [15].
5. Cloud Controller
It is the entry-point into the cloud for the users and administrators. Queries are
made to the node manager to receive information regarding the resources,

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 79


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

making high level scheduling decisions, and implementing them by sending


requests to cluster controllers.

3. PROPOSED WORK
In this paper our aim is to modify the existing Inter-Cloud communication
which is currently setup by the use of an exchange server for the two clouds to
communicate between each other by the use of Software Defined Network
(SDN). In an Inter-Cloud system, hubs from a cloud might utilize resources,
administrations or data from different clouds. The principle objective of an
Inter-Cloud system is to make open interfaces that might govern the exchange
and probability of information from a cloud to the others. With a specific end
goal to outline these characteristics, there is a need of making a productive
protocol to exchange messages between clouds. Connections ought to be
created between the devices that will be the boundary of the cloud. All cloud
protocols ought to have the capacity to be interpreted to different protocols that
must be reasonable in different clouds. Also, a common protocol might be
wanted to trade data between clouds.
There are some present customer platforms that have the ability to join different
clouds concurrently, so they have the ability to utilize more than one particular
protocol. They might be utilized as gateway between clouds. At the same time
this alternative is bad to join numerous clouds because of a several reason. On
one hand, clients that may as well go about as portals between clouds may as
well have numerous resources and computing limit with a specific end goal to
join numerous clouds. Also, they ought to have the capacity to interpret
protocols between clouds and the point when another cloud is included; the
software of all gateways ought to be changed. Then again, general clients might
as well have a redesigned rundown of all gateways in its cloud with a specific
end goal to achieve the administrations offered by different clouds. With a
specific end goal to settle this issue, a proposal of a layered engineering that
permits associations between clouds which happens between the Dnodes of the
clouds. The architecture of the cloud is formed by 3 layers. The lowest one is
the Access Layer, which is formed by the regular nodes of the cloud. The
medium layer is called the Distribution Layer, which is formed by Dnodes and
the highest layer is called Organization Layer, formed by Onodes [15]. There
are logical connections setups between Dnodes of two clouds and in charge of
information exchange. On the other hand Onodes are used to setup logical

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 80


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

connections between different Dnodes of other clouds. Dnodes use Onodes to


get information regarding which Dnodes of the other cloud is more suitable to
setup connection. The cloud suppliers need is an application programming
interface (API) for the network layer so they can control the stream of their
particular requisitions over their foundations. Though, the software-defined
system has arrived through Openflow. Openflow and the software-defined
system help cloud suppliers be more agile with system administration. So far,
the administrators have needed to utilize the same network administration
conventions that the merchants designed and underpinned through standard
bodies, like Internet Engineering Task Force (IETF). With Openflow, a system
software engineer can compose code to administer particular sorts of streams
crosswise over network mechanisms dependent upon expense of the supplies,
load on the network and the quality of different sorts of activity, for instance.
At present, there are numerous tests going ahead in the industry that are tucked
behind the paper drape of nondisclosure. The question which arises is how
much would Software Defined Network help the suppliers in reaching their
goal. That would be the first handful of outlets to underpin Openflow over the
greater part of their cloud suppliers' stages.
As seen in Figure 4 the existing Inter-Cloud communication occurs through a
packet switch router, which does setup a physical interconnection between the
clouds. However, by the use of the SDN technology to virtualize the Inter-
Cloud communication the packet switch router is skipped and a virtual direct
path is setup between the clouds as seen in Figure 5.

Figure 4. Inter Cloud communications through Exchange Servers

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 81


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 5. Inter Cloud communications through the use of SDN

4. RESULTS AND OUTCOME


The use of SDN in turn improves the overall performance of the device and the
accessibility also increase by folds as the graphical output for Latency and
Throughput can be seen in Figure 6 and 7 respectively.

Figure 6. Latency: Exchange Server vs. SDN

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 82


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Figure 7. Throughput: Exchange Server vs. SDN

5. CONCLUSION
It is clear that programmable networks and SDN speak to paramount shifts that
empower more adaptability in future systems, these technological achievements
are still in its early budding stage. Just the precise adopters have deployed
SDN, they have begun in small scale, testing the thought in a particular part of
their infrastructure, demonstrating pertinence before making bigger
speculations and conveying all the more extensively over their network
infrastructure. SDN application is case particular, accordingly to follow how a
SDN could profit, it is important to evaluate the network and create precisely
what is required. The predominant thing that needs to be remembered is the
mechanisms and applications live on the network and what requirement would
be needed by the network later down the line. Thinking ahead and recognizing
what a network may need to do in the coming years can give organizations an
edge when acknowledging distinctive approaches to optimize a network. At
last, works with an outlet free accomplish who carries a careful comprehension
of network administration, organize architectures and conventions. This may as
well incorporate information of how to execute the best network,
comprehension visualized frameworks and the way of distributed computing
workloads and how these elements affect the network. This approach will direct
a logical and useful SDN decision that will fit the overall s network
architecture, guaranteeing the network platform presses on to uphold ICT and
business targets - make a system that permits you to develop as opposed to
depending on one that holds you back.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 83


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

6. REFERENCES
[1] Mell P and Grance T. The NIST Definition of Cloud Computing. Special Publication
800-145, National Institute of Standards and Technology (NIST) 2011.
[2] Amazon. Summary of the Amazon EC2 and Amazon RDS Service Disruption. Jun 1
2012. URL http://aws.amazon.com/message/65648/.
[3] Software Defined Network: http://en.wikipedia.org/wiki/Software-defined_networking
[4] Buyya R, Ranjan R, Calheiros RN. InterCloud: Utility-Oriented Federation of Cloud
Computing Environments for Scaling of Application Services. Proceedings of the 10th
International Conference on Algorithms and Architectures for Parallel Processing,
Springer-Verlag: Busan, Korea, 2010; 1331.
[5] IEEE Intercloud Testbed Project Announces Founding Members
http://www.businesswire.com/news/home/20131008005556/en
[6] William Stallings. The Internet Protocol Journal, Volume 16, No.1 Software-Defined
Networks and OpenFlow.
http://www.cisco.com/web/about/ac123/ac147/archived_issues/ipj_16-1/161_sdn.html
[7] Daniel Nurmi, Rich Wolski, Chris Grzegorczyk Graziano Obertelli, Sunil Soman,
Lamia Youseff and Dmitrii Zagorodnov. The Eucalyptus Open-source Cloud-
computing System. Computer Science Department University of California, Santa
Barbara Santa Barbara, California 93106.
[8] Wierzbicki, R. Strzelecki, D. Swierczewski and M. Znojek. Rhubarb: a tool for
developing scalable and secure peer-to-peer applications. Second IEEE International
Conference on Peer-to-Peer Computing (P2P2002), Linping, Sweden, 2002.
[9] Z. Xiang, Q. Zhang, W. Zhu, Z. Zhang and Y. Zhang, Peer-to-peer based multimedia
distribution service. IEEE Transactions on Multimedia 6 (2) (2004).
[10] P. Krill and Cerf. Standards for Cloud Computing. InfoWorld, January 08, 2010.
<http://www.infoworld.com/d/cloud-computing/cerf-urgesstandards-cloud-computing-
817>
[11] D. Bernstein, E. Ludvigson, K. Sankar, S. Diamond and M. Morrow. Blueprint for the
Inter-Cloud protocols and formats for cloud computing interoperability. 4th
International Conference on Internet and Web Applications and Services (ICIW 09),
May 2428, 2009. Venice, Italy, pp. 328336.
http://dx.doi.org/10.1109/ICIW.2009.55.
[12] Global Inter-Cloud Technology Forum. Use Cases and Functional Requirements for
Inter-Cloud Computing. GICTF White Paper. August 9, 2010.
<http://www.gictf.jp/doc/GICTF_Whitepaper_20100809.pdf> (accessed 10.12)
[13] Nikolay Grozev and Rajkumar Buyya. Inter-Cloud Architectures and Application
Brokering. Taxonomy
and Survey, SOFTWAREPRACTICE AND EXPERIENCE Softw. Pract. Exper.
2012; 00:122
[14] Sundarrajan,Kishorekumar Neelamegam and V. T. Prabagaran. Improve file sharing
and file locking in a cloud
Modify block storage to provide more efficient Infrastructure-as-a-Service, 28 July
2010.
http://www.ibm.com/developerworks/cloud/library/cl-modblockstore/index.html

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 84


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

[15] Jaime Lloret, Miguel Garcia, Fernando Boronat and Jesus Tomas. Group-based Self-
Organization Grid Architecture. http://personales.upv.es/jlloret/pdf/lncs2007-1.pdf
[16] MEF Announces 2013 Worldwide Carrier Ethernet Awards Finalists: October 2013,
http://metroethernetforum.org/news-events/press-releases

This paper may be cited as:


Chowdhury, S. R., Patel, S. K., Mandekar, A. V. and Devi, U. G, 2014.
Virtualizing the Inter Communication of Clouds. International Journal of
Computer Science and Business Informatics, Vol. 13, No. 1, pp. 72-85.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 85


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Tracing the Adversaries using Packet


Marking and Packet Logging
A. Santhosh
PG Scholar, Department of Information Technology
Sona College of Technology,
Salem, India.

Dr. J. Senthil Kumar


Professor, Department of Information Technology
Sona College of Technology,
Salem, India.

ABSTRACT
Today internet security is a most important. Adversaries often hide themselves their own IP
address and then initiate attacks. Some of them use only packet logging schemes to achieve
IP tracking. Some of them use combine packet marking and packet logging. We initiate a
new IP trace back scheme and efficient packet marking aim to have a standard storage
requirement for each router. The complicated attackers may launch a Distributed Denial of
Service (DDoS) attack. Deterministic Packet Marking performs the trace back without to
show the inner topology of the provider's network.
Keywords
Denial of Service, Distribute Denial of Service, Probabilistic Packet Marking, ARP-
Address Resolution Protocol, Reverse Address Resolution Protocol,

1. INTRODUCTION
The famous attackers might launch a Distributed Denial of Service (DDoS)
attack to interrupt the service of a server. The (DDoS) attack Based on the
packets to deny the service of a server, We can classify DDoS attacks into
ooding based attacks and software use attacks, The main limitation of this
type of resolution is aware only on flood-based (DoS) and (DDoS) attacks,
The major signature of ooding based attacks is a total amount of copied
source packets to weaken a sufferer limited resources, source IP address can
be spoofed when attacker want to hide themselves from tracing because IP
spoofing makes difficult to find from DDos attack, finding the source of
attack require tracing the packets back to the source step by step. IP trace
back approach can find the attacker, ICMP message is used to reconstruct
attack path. A new IP Trace back packet marking scheme having two
categories Deterministic Packet Marking (DPM) and Probabilistic Packet
Marking (PPM).This is used to mark a border routers IP address on the

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 86


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
passing Packets, this technique makes it likely to trace even a single packet
and many packets.

2. RELATED WORKS
2.1 Denial of Service (DoS)
Hussain, A., et al [1] Denial of Service (DoS) attack is important,
foundation IP address in a packet can be spoofed later than attacker wants to
conceal himself from tracing. Then IP spoong makes hosts tough to secure
against a (DDoS) attack, finding and reply is a difficulty slow. routine of
attacks as single or multi source know how to help focus a answer but near
packet-header-based approach are simply undue to spoong, a framework
for classify DoS attack based on header content to estimate our framework,
a partial ISP finding 80 attacks Header analysis found the Number of
attackers in 67 attacks while the left behind 13 attacks are based on ramp up,
This type of attack is whichever network bandwidth, or operating system
data structures attack. The majority response mechanisms try to look up the
damage caused by the attack by taking reactive measures like reducing the
amount of the attack by blocking attack packets, this approach needs extra
packets to trace the origin of attack packets. Attacks: header contents, ramp-
up behavior, spectral characteristics.

2.1.1 Header Contents:

Attacker

Observation point

Sufferer

Figure 1. Single-source

Fig 1 Shows that Headers can be copied by the attacker, most attacks spoof
the source address hide the number of attackers. Header eld such as the
fragment identication eld (ID) and Time-To-Live eld (TTL) can be
indirectly interpreted to provide hints about the number of attackers. The

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 87


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Packet classify as belong to the same sequence if their ID values are divided
by less than idgap and the TTL value remains constant for all packets. We
allow for some partition in id gap ID value normally wraps around within a
second. Therefore using a small idgap also limits conflict during order
identication,

2.1.2 Ramp-up Behavior:


Figure 2. Show that a slow ramp-up (several hundred milliseconds or more)
suggest a multi-source attack; a master normally activate a large number of
transfer a trigger message that either activates the attackers immediately or
at some later time.

Attacker Attacker Attacker

Observation point

Sufferer

Figure 2. Multi-sources

Single-source attacks do not exhibit a ramp-up behavior and typically begin


their attack at full strength. The presence of a ramp-up provides a suggestion
to whether the attack is single- or multi-source. This method cant strongly
identify single-source attacks since an intelligent attacker could create an
attack.

2.1.3 Spectral characteristics:


Attacks as single - or multisource is to think about their spectral
characteristics, Attack streams have clear different spectral content that
varies depending on the number of attackers. Analyze the spectral

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 88


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
characteristics of an attack stream. Spectral analysis requires treating the
packet trace as a time series.
Burch, H. and Cheswick, B., [2] create a map of the routes from injured
party to every network using any internet, Mapping technology: Starting
with closest router we apply load UDP charge requests to each link, if the
stream is changed then it is on the attack path, it is not on the path or the
load is not sufficient.

2.1.4 SYN attacks:


The another type networks are more difficult to protect, when object is to
load a companys link ISP ,the attacker focus a huge stream of data close to
the companys network regularly number of sites, Companys link becomes
over crowed consequent the packet loss, because routers cannot separate
between attacking packets and suitable client packets, they drop them with
equal state, If the attacker be able to throw packets fast sufficient, drop rate
can develop into so high that not sufficient number of a clients packets get
throughout, when user cant find sensible service from any loaded link, the
majority of this category of attack is the smurf attack ,while modern (DDoS)
attacks have been of this type and this process is offensive for Web site
company. network packets can enter in the different source address in
several location, source address filtering become difficult and failed both
inner and outer filtering are mentioned manually but present considerable
overhead for the routers in conditions of processing all packet.
Belenky, A. and Ansari, N., [3] Security of PPM schemes arise from the
fact that an attacker can change content in a packet, attacker can marked
with wrong information, this is called mark spoofing, internet service
provider may only use public addresses for interfaces to customers and other
networks and use private addressing plans within their own networks,
Prevention of such behavior is accomplished by special coding techniques,
and is not 100% proof, If every packet, which arrives to the victim, is
ensured to be correctly marked, then the need in those complex and
processor intensive encoding techniques will be unnecessary,
2.2 Address Resolution Protocol (ARP)
Apichan Kanjanavapastit, Dr., [4] Figure 3 explain that Address Resolution
Protocol (ARP) and Reverse Address Resolution Protocol (RARP).

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 89


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Logical Address Logical Address

ARP
RARAP

Physical Address Physical Address

IGMP ICMP IP

ARP RARP

Figure 3. ARP RARP and their position in TCP/IP protocol suite

Figure 4. ARP operation

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 90


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Figure 4 Shows that network is a combination of physical connected
mutually by internetworking devices such as routers. a packet early from a
source system might pass from end to end several different physical
networks before finally reaching the destination system, hosts and routers
are known at the network level by their logical addresses the IP addresses in
TCP or IP, packets have to pass during physical networks to reach these
hosts and routers, the hosts and routers are known by their physical address.
We want able to map a logical address to its equivalent physical address,
two protocols have been considered to perform dynamic mapping: Address
Resolution Protocol and Reverse Address Resolution Protocol.

Table 1. Packet marking field

Hardware Type Protocol Type


Operation
Hardware Protocol Length Request1, Request2
Length

Sender hardware address


(For example : 6 byte for Ethernet)

Sender Protocol address


(For example : 6 byte for IP)
Target hardware address
(For example : 6 byte for Ethernet)
(Its not filled in a request)

Target Protocol address


(For example : 4 byte for IP)

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 91


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.2.1 Detail of Packet marking fields:
Table1. Explain the Packet marking fields. A Hardware type is a 16 bit field
defines the network on which ARP is running. Ethernet is given the type
and then 16-bit field is defining the protocol. IPv4 protocol value is 0x0800,
Hardware length is 8 bit field essential length of the physical address in
bytes for Ethernet the value is 6, Protocol length is 8-bit field essential of
the logical address in bytes. The IPv4 protocol the value is 4, Operation is a
16-bit field defining the type of packet, two packet types: ARP request (1),
ARP replay (2). Sender hardware address is a variable length field essential
of the sender physical address. Ethernet this field is 6 bytes extended,
Sender protocol address is variable-length field defining the logical address
of the sender. IPv4 protocol this field is 4 bytes long, Target hardware
address is a variable length field defining the target of the physical address.
Ethernet field is 6 bytes long. For and ARP request message field is all 0s
as the sender dont know the physical address of the target, Target protocol
address is a variable-length field defining the target of the logical address.
IPv4 protocol this field is 4 bytes long.
Savage, S., et al [5] this technique is forced by the increased frequency and
difficulty of Denial-of-Service attacks and by the difficulty in tracing
packets with spoofed, this technique for tracing unknown packet flooding
attacks in the network back across their source. In the general purpose trace
back method based on the probabilistic packet marking in the network, a
offended party to identify the network path traverse by attack traffic without
require interactive operational support from ISP, An achievement of this
method is incrementally deployable, commonly backwards responsive and
can be ably implement using usual method. One way to address the problem
of unidentified attacks is to reduce the ability to fake source addresses. One
such approach frequently called access ltering. Denial-of-Service through
packet flooding attackers conquer the network and CPU resources with
unwanted requests ,Internet architecture permits secrecy use of IP source
address is voluntary , attacker use fake source address, Network is stateless
,hard to react to attack without knowing what network path it traverse,
Attacker can generate any packet many attackers may combine (DDoS).
Routers join their address to each packet , Can be exclusive to execute ,no
guarantee of sufficient space in packet Probabilistically mark packets in
routers , Marked packets have model of path, its a robustness and random
marking decisions, it cant be expected by attacker, Cant be restricted by
attacker, attacker can lie and create edges, Song, D. X., and Perrig, A., [6]
The Advanced and Authenticated Marking Scheme agree to trace back the
estimated origin of spoofed IP packets, this technique feature low network
and router overhead, and help to increment deployment. These technique
have signicantly lower computation overhead and lower false positive rate

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 92


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
for the victim to rebuild the attack paths in large scale distributed denial of
service attacks, the Authenticated Marking Scheme provides efficient
verification of routers markings such that even a compromise router cannot
fake or damage markings from other uncompromised routers, a hard
problem to decide the source of these spoofed IP packets, IP trace back
problem this approach has a very high computation overhead for the victim
to reconstruct the attack paths and gives a large number of false positives
when the Denial-of-Service attack originates from multiple attackers.
The directed acyclic graph rooted at V represents the network as seen from a
victim V and a distributed denial-of-service attack from A2 and A3.V could
be either a single host under attack or a network border device such as a
rewall representing many such hosts Nodes Ri represent the routers, which
we refer to as upstream routers from V, and we call the graph the map of
upstream routers from V. For every router Ri, we refer to the set of routers
that immediately before Ri in the graph as the children of Ri, The attack
graph is the graph composed of the attack paths and we refer to the packets
used in DDoS attacks as attack packets. We call a router false positive if it is
in the reconstructed attack graph but not in the real attack graph. Similarly
call a router false negative if it is in the true attack graph but not in the
reconstructed attack graph spoong by an attacker, packets are written by
the attacker will have distance eld greater than or equal to the length of the
real attack path. The victim can use the edges marked in the attack packets
to reconstruct the attack graph.
A1 A2 A3

R5 R6 R7

R3 R4

R2

R1
v
Figure 5. Particular attack path between an attacker and the injured party

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 93


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

2.2.2Advanced Marking Scheme I:


In this basic approach we use a related marking scheme as FMS, as an
another of encoding the IP address of a router into eight fragments, only
encode its hash value, this scheme divide the 5-bit distance eld and 11-bit
edge eld and 16-bit IP Identication eld . 5 bits can represent 32 hops
which is enough for approximately all Internet paths.

Advanced Marking Scheme II:


Advanced Marking Scheme is more skilled and perfect than FMS in case of
DDoS , False positives present more than concerning 60 distributed
attacker sites. 11-bit hash value is not sufficient to avoid collapse when
there are some routers at the same space to the victim in the attack table. In
a different of using just two hash functions use two sets of independent hash
function. These technique have very low and router transparency and carry
incremental operation.
Yaar, A., et al [7] Currently proposed IP trace back mechanism are not
enough to address the trace back problem for the following reasons: they
want DDoS sufferers to accumulate thousands of packets to rebuild a single
attack path, and dont extent to large scale Distributed DoS attacks, and
dont support incremental use. Fast Internet Trace back (FIT) set up a new
packet marking approach that improving IP trace back in several dimension
, victims can see attack paths with high probability after getting packets,
decrease of orders of extent compared to previous packet marking schemes,
FIT perform well even in the presence of heritage routers, each FIT enabled
router in path to be identify, FIT scales to huge distributed attacks with
thousands of attackers compare with earlier packet marking schemes, Fast
Internet Trace back represent a step onward in performance and deploy
ability, the most DDoS attacks against the Domain Name System (DNS)
request guard money to stop the attack or assurance that it will not be
repetitive, with the latest rise of e-crime, law enforcement and attack
sufferers repeat the need for a sufficient IP trace back mechanism. But
current proposal for trace back mechanisms suffer from various drawbacks
including high process and storage costs, small scalability to high attacker
populations and reduced performance in the presence of legacy routers.
PPM schemes are mostly achieved some of these belongings but they
require on the order of thousands of packets from each attacker for trace
back. We express a new approach, FIT to progress packet marking trace
back, Our Fast Internet Trace back (FIT) protocol can perform trace back
even after a very small number of attack packets with smallest processing
overhead and without contacting any external entity and save from harm the

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 94


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
advantages of packet-marking trace back approaches. FIT handle legacy
routers better than any earlier mechanism victim can even detect the reality
of legacy routers on the attack path. FIT can rebuild a path even after a
single attack packet. FIT achieves these properties through a new approach
for upstream router map rebuilding, a one-bit eld to calculate up to 32 hops
to the distance to the marking router, node-based marking as a replacement
for of edge-based marking, and a fast method to identify the marking router.
These techniques give FIT a past unachieved set of properties, making it one
of the only feasible approaches for IP trace back, FIT provides the unique
property that the sufferer can detect the presence of legacy routers on the
attack path. Previous trace back mechanisms either fail completely when
large numbers of legacy routers are present, or fake reconstructed router
locations along the path by including only trace back enabled routers in their
distance measurement. FIT properly identies the distance in router hops
from the victim regardless of whether intervening routers are trace back-
enabled or not, FIT does not find the router map through out-of-band tools,
such as trace route.

Bellovin, S.M., et al [8] ICMP is the next protocol of selection. Echo reply
attack was the most popular reector attack, the most Internet hosts reply to
an echo request packet allow the attacker to choose from the large number
of possible reectors, remaining ICMP attacks use echo request packet or an
invalid ICMP code. A reector is any host that replies to requests, ex: a web
server that responds to TCP SYN desires with a SYN-ACK reply, or any
host that responds to ICMP echo desires with ICMP echo replies. Many
hosts can be used as a reector by spoong the sufferer IP address in the
source eld of the request, the reector into directing its response to the
victim. Reectors can be used to the show the address on the reector
network, request a response from every host on the LAN, ICMP more and
more filtered, complex to trace through discontinuities partial operation
combine attack and signal.
Gong, C. and Sarac, K. [9] It improve the accurateness of IP trace back,
effectiveness and it give incentive for ISP to set up IP trace back in their
network. PPM approach is a new IP header encoding method to accumulate
the entire identification in order of a router into a single packet. Eliminates
the computation transparency and false positives due to router identification
fragmentation, It can be able to control the distribution of marking
information.PPM the probabilistic packet marking approach has been the
router probabilistically script packets through its identification information,
then the objective reconstruct the network path by merge a number of such
noticeable packets, IP trace back is not partial merely to DoS and DDoS
attacks. IP trace back technique neither check nor stop the attack, they are

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 95


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
used only for recognition of the sources of the wrong packets throughout
and after the attack, Trace back method partial to identify the point
anywhere the packets constitute the attack enter the Internet.
CAIDA's skitter project [10] The two files available for download above
establish CAIDA let loose the adjacency matrix of the Internet router level
graph compute from skitter and iffinder measurement, its create IP links
from ITDK skitter traces, combine edge alias found with if finder into
routers, map IP links to router links, take away all nodes with either in-
degree 0 or out-degree 0 (note that the majority of nodes removed this way
are end hosts), Anonymize IP addresses.
Gong, C. and Sarac, K., [11] it likely to trace even a single packet and its
considered more authoritative. Routers send the large volume of traffic.
High storage space overhead and access time requirement for recording
packet digest introduce sensibleness problems.

2.3 IP trace back


Trace a single packet: IP trace back approach to trace both ooding and
software-exploit DoS attacks, Robustness, Attackers may be aware of and
try to compromise the IP trace back approach such attacks are desirable,
Backward compatibility, IP packets may undergo valid transformation while
traversing the network. IP trace back approach should operate in the
presence of such transformations, Financial motivation, Internet Service
Providers (ISP) prefer value-added services which can create new takings
streams, IP trace back approach should be right to be deploy as a revenue
generating service, Reduce the transparency on routers transparency forced
on the deploy routers should be acceptable, we can only trace out the single
packet IP, then this method cant find the multiple packets IP.
John, W. and Olovsson, T., [12] Traffic anomaly monitor the description
data , a lot of attackers like Denial of Service lead to develop the technique
for find out the network traffic, well organized analysis tool we could avoid
the network from the traffic before it could get attacked , the address datas
changed using the part wavelet transform for notice the traffic anomaly, It
could be give an winning means of detecting anomaly close to the source,
Multidimensional indicator using the relation of port numbers and the
number of flows as a means of notice anomaly, a lot of port number and
approach while our work focuses on a single link at a time ,We want to
avoid and reduce malicious network traffic, Ex: Rule based approach such
as intrusion detection system (IDS), Established rules against incoming
traffic to detect and identify potential DoS attacks close to the victim
network.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 96


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
IDS tool such as require updating with the latest rules, this problem of
designing size based real-time detection mechanisms. Further treat the
traffic headers such as addresses of traffic data, additional in analyzing other
packet header data such as addresses and port numbers in real-time.
Technique: examine the traffic by a router, traffic at a source network allow
early detection of attacks to control take control and to limit the waste of
resources. Two type of filter based on traffic controlling point: Inbound
filter keep the flow of traffic incoming into an inside network below
administrative control, inner filtering is logically perform throughout
firewall or IDS rules to control inbound traffic originated from the public
Internet, and outbound filter manage the flow of traffic leaving the
administer network. Filtering is carry out at the campus edge, the first is a
traffic parser, in which the involving signal is generate from packet header
traces or network Flow records as input. Packet header on the field, such as
target addresses and traffic volume depending in the signal and port
numbers, second is to transform the signal through the separate wavelet
transform DWT. Analyzing discrete domains such as address spaces and
port Numbers poses exciting problems for wavelet analysis, then use the
technique of finding the attack. Its can done setting the edge, Attackers are
notice using the statically analysis. Report on our results use correlation of
port numbers, sharing of the number of flows as keep an eye on traffic
signals and destination addresses.
Malliga, S. and Tamilarasi, A., [13] conquer and Tracing Denial of Service
(DoS) attacks are one of the tough security problems on IP networks. and
spoofing of IP packets makes it difficult to secure such attacks ,one of the
technique to reduce the DoS attack help to trace the origin of the packets by
using packet marking along with, this trace back method to find the true
source of the attack the router marks the packets with inbound interface
identifier of the router.
The testing ability ensures that it requires less amount of time to mark and
reconstruct the graph. we capable to trace back to single packet, after that it
require logging at very few routers and so incur unrelated storage overhead
on the routers, TCP and IP allow some flexibility in implementation,
including many features, to maintain research and further improvement of
these protocols, serious to know about current use of protocol specific
features and associated anomaly, this work is future to return the present
character of Internet backbone traffic and point out misbehavior and
potential problems, examination of the data give a full summary about
current protocol usage including comparison to prior study. Further header
misbehaviors and anomaly were found inside approximately all feature
analyzed and are discuss in detail, the main information for designer of
network protocol net request and network attack decision systems.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 97


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Snoeren, A. C. ., et al [14] Trace back system has been available to track
particular packets in an efficient,. Hash based technique for IP trace back
that generate trail for traffic inside the network, It can be trace the source of
a single IP packet deliver by the network in the new past, upward a Source
Path Isolation Engine (SPIE) to allow IP trace back, capable to identify the
source of a exacting IP packet given a false of the packet to be traced its
destination, Than predictable time of receipt. it does not improve a
networks vulnerability to eavesdropping. consequently SPIE allows routers
to without difficulty determine if they forward a particular packet within a
specific time gap while maintaining the privacy of unrelated traffic, raising a
trace back system that can trace a single packet has long been view as not
practical suitable to the great storage requirements of reduction packet data
and the increased eavesdropping risk the packet logs pose. We consider that
SPIEs key involvement is to express that single packet tracing is sufficient.
It deal with the difficult problem of transformation and can be implement in
high-speed routers ,the most urgent challenges for SPIE are rising the
window of time in which a packet may be successfully traced and reducing
the amount of information that must be stored for transformation handle.
One possible way to make bigger the length of time queries can be conduct
without linearly increasing the memory requirements is by relaxing the set
of packets that can be traced. In particular, SPIE can support trace back of
large packet flows for longer period of time in a way to probabilistic
marking schemes rather than remove packet digest as they conclude, remove
them probabilistically as they age. For large packet flows, odds are quite
high some element packet will remain traceable for longer periods of time,
reveal that the system is effective, space efficient, and implementable in
present or next-generation routing hardware.

2.4 HASH-BASED DDOS


The scheme described in this section utilizes a hash function, H(F). To
simplify the performance analysis, the hash function is assumed to be ideal.
An ideal hash function minimizes the chances of collision, an occurrence
when two different entrance addresses result in the same hash value. In
other words, H(F) is assumed to produce a collision only after all possible
hash values have been produced. It is also assumed that the hash function is
known to everybody, including all DPM-enabled interfaces, all destinations,
which intend to utilize DPM marks for trace back, and the attackers. The
constraint of 16bits still remains, so a longer digest would result in fewer
bits of the actual address transmitted in each mark, and consequently, the
higher number of packets required for trace back.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 98


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
2.4.1 Entropy Variations:

For a local router, suppose that the number of flows is


N, and the probability distribution is P {p1, p2... pN}. We can simplify the
expression of entropy as follows:
H (F) = H (p1, p2... pN) = - pi log pi.

Based on the characteristics of the entropy function, we obtain the upper


bound and lower bound of H (F) as follows:
0 H (F) log N.

We reach the lower bound when pi = 1, 1 I N, pk =


0, k = 1, 2, . . .,N and k i; we have the upper bound when p 1 = p2 =.=
PN. Based on our definition of the random variable of flows, we have the
following special cases to reach the lower bound and the upper bound,
respectively: when there is only one flow alive during the sampling time
interval, and there are no packets going through the local router for the other
flows, H(F) = 0; when the number of packets for each flow is the same
among all the flows at a local router, then we have H(F) = log N.

3. CONCLUSIONS
Finally we conclude that 16 bit packet marking can achieve Denial of
Service Attack, we can trace the adversaries by using hash table, 32 bit
packet marking having the fragmentation problem since I am introduce the
16 bit packet marking. It can be solve the fragmentation problem, at the
same time memory overhead also solved, and then further work we would
like to trace another version of attack and packet marking and packet
logging scheme.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 99


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
REFERENCES

[1] Hussain, A. Heidemann, J. and Papadopoulos, C., Aug 2003. A framework for
classifying denial of service attacks, in Proc. ACM SIGCOMM, Vol. 03, Karlsruhe,
Germany, pp. 99110.
[2] Burch, H. and Cheswick, B., Dec. 2000. Tracing anonymous packets to their
approximate source, in Proc. USENIX LISA 2000, New Orleans, LA, pp. 319327.
[3] Belenky, A. and Ansari, N., Aug 2003. Tracing multiple attackers with deterministic
packet marking (DPM), in Proc. IEEE PACRIM, No. 03, Victoria , BC, Canada ,
pp. 4952.
[4] Apichan Kanjanavapastit, Dr., Address Resolution Protocol (ARP) [online]
Available: http://dc345.4shared.com/doc/sg6Scmu6/preview.html
[5] Savage, S. Wetherall, D. Karlin, A. and Anderson, T., Aug 2000. Practical network
support for IP trace back, inProc. ACMSIGCOMM2000, Stockholm, Sweden, pp.
295306.
[6] Song, D. X. and Perrig, A., Apr 2001. Advanced and authenticated marking schemes
for IP trace back, in Proc. IEEE INFOCOM2001, An- chorage, AK, pp. 878886.
[7] Yaar, A. Perrig, A. and Song, D., Mar. 2005. FIT: Fast internet trace back, in Proc.
IEEE INFOCOM2005, Miami, FL, pp. 13951406.
[8] Bellovin, S.M. Leech, M.D. and Taylor, T., Feb.2003. ICMP trace back messages,
Internet Draft: Draft-Ietf-Itrace, No. 04. Txt.
[9] Gong, C. and Sarac, K., Mar 2009. Toward a practical packet marking approach for IP
trace back , Int. J.Network Security, Vol.8, No. 3, pp. 271281.
[10] CAIDAs Skitter Project CAIDA, 2010. [Online]. Available: http://www.caida.org/
tools /skt/.
[11] Gong, C. and Sarac, K., Oct 2008. A more practical approach for single-packet IP
trace back using packet logging and marking, IEEE Trans. Parallel Distributed Syst.,
Vol. 19, No. 10, pp. 13101324.
[12] John, W. and Olovsson, T., 2008. Detection of malicious trafc on backbone links via
packet header analysis, Campus-Wide Inform. Syst., Vol. 25, No. 5, pp. 342358.
[13] Malliga, S. and Tamilarasi, A., Apr 2008. A proposal for new marking scheme with
its performance evaluation for IP trace back, WSEAS Trans. Computer Res., Vol. 3,
No. 4, pp. 259272.
[14] Snoeren, A. Partridge, C. Sanchez, L.A. Jones, C. Tchakountio, E. Schwartz, B. Kent,
S. T. and Strayer, W. T., Dec 2002. Single-packet IP trace back, IEEE/ACMTrans.
Networking, Vol. 10, No. 6, pp. 721734.

This paper may be cited as:


Senthil K. J. and Santhosh, A., 2014. Tracing the Adversaries using Packet
Marking and Packet Logging. International Journal of Computer Science
and Business Informatics, Vol. 13, No. 1, pp. 86-100.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 100


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

An Improved Energy Efficient Clustering


Algorithm for Non Availability of
Spectrum in Cognitive Radio Users
V. Shunmuga Sundaram
PG scholar, Department of Computer science and Engineering,
Sri Krishna College of Technology, Coimbatore, India

Dr S. J. K Jagadeesh Kumar
HOD, Department of Computer Science and Engineering,
Sri Krishna College of Technology, Coimbatore, India

ABSTRACT
The main function of Cognitive Radio Technology is to enable the Spectrum Utilization and
detect the unused spectrum and sharing it without harmful interference to licensed users. Energy
Consumption is a primary concern in the Wireless device Networks. The cognitive radio main
function is to provide the channel to the user to enable the spectrum resources. The proposed
solution is distributed Efficient Multi-Hop Clustering routing protocol which can consider not
only for static mobile nodes but also in the Mobile Environment and used to reduce the packet
loss during the cluster communication. The main function is to select the cluster head according
to the energy level, Connectivity and Stability and transfer the information from the source to the
destination. The nodes in the clusters should be advertised the cluster head to other nodes. It
improves the Connectivity between the Cluster head and provides the active communication. The
DEMC protocol function is to change according to the topology networks and the information
stored in the radio networks. It mainly increase the chance of generating the communication link
that leads to finds more reliable communication path for Data Transmission.

Keywords
Cognitive radio Technology, Distributed Multi Hop Clustering Protocol, Wireless Sensor
Networks, Multiple channel sequence generation algorithm, Handoff Information.

1. INTRODUCTION
A mobile ad-hoc network (MANET) is a kind of wireless ad-hoc network and the
main function of Ad hoc networks is to less dependence of the infrastructure and
increase in speed and the ease of deployment. The associated hosts are connected
by wireless links and the mobility of the node should help to cause the route
changes in the network form an arbitrary topology [1]. It is a self-organizing and
adaptive to any environments. The routers are free to move corresponding to the
mobile nodes and organize themselves arbitrarily; thus, the network's wireless
topology may change rapidly and updated according to the link breakages. These
Kind of Networks may function in a Standalone fashion. The traffic

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 101


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
characteristics should be differs in different ad hoc networks corresponding to the
bit rate, timeliness and reliability requirements. The proposed protocol is a source
driven sensor network application. More number of users may be connected in
these networks without any security. It may be connected to a large Internet.
MANET main function in wireless sensor networks is to transfer the signal [2].
802.11/Wi-Fi is used in the wireless networking became widespread in the mid to
late 1990s [1]. Ad-hoc networks which function is to communicate with many
number of users and provides less security. There is a temporary base station and
the nodes formed as a topology. Analogous to traditional cellular networks, the
partitioning in Ad Hoc networks, known as clustering, is used to solve the
inefficient use of power and bandwidth for every node to communicate directly.
Each cluster elects one cluster head, the upper layer node, to manage the cluster
and coordinate with other clusters [3]. Link failures due to node mobility pose
serious issues in routing of ad hoc networks [12].
Rapidly changing topology and frequent path failures make sensor network more
challenging. Path breakage results in large packet delay and packet loss, hence
more energy consumption. Mobile Ad Hoc routing protocols like Ad hoc On-
Demand Distance Vector (AODV) Routing [14] and On Demand Multi path
Distance Vector Routing (AOMDV) in Ad Hoc Networks [16] work well in
conventional networks but perform poorly in sensor networks because of
constrained resources. Secondly, frequent path failures drive recovery mechanism
are energy consuming because the user not want to choose the new path. Some
routing protocols assume that each sensor node can directly send data to base
station [7, 10, 15], which is not a realistic assumption because it is restricted by
limited energy, regulatory authorities, and scalability issues. Therefore, multi hop
communication paradigm is used. But multi-hop strategy result in frequent path
breakage in mobile environments. As a result packet delay and packet loss are
larger as compared to static networks. Hierarchical routing has been widely
investigated for ad hoc networks [2, 7, 13] due to their energy efficiency and
scalability.
The essential operation in hierarchical routing is to select a set of cluster heads
from a set of nodes in the network, and then group the remaining nodes with
these cluster heads. Sensor field is divided into regions called clusters and each
Cluster should have Cluster Head. The protocol constructs the clusters where
every node in the network is no more than 1 hop away from the cluster head. The
cluster head should be selected from the n number of the sensor nodes. The main
aim of the proposed protocol is to forming the non-overlapping clusters. The
boundary nodes should be act as the gateway of the inter-cluster heads [4]. The
DEMC protocol which helps to organize the cluster Head and the sensor nodes
that belongs to it. The bandwidth should be reused and the security should be
enhanced in the network by using the clustering approach.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 102


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

Fig 1 Establishment of Communication range between Cluster nodes and


Base station
2. RELATED WORK
A wireless sensor network consists of a large number of sensor nodes and a Base
Station (BS) and is used to monitor certain physical phenomenon. The BS
typically acts as a gateway to other networks and is comparatively resourceful
[3]. While small size sensor nodes are limited in power, processing, and memory
[4]. Mobile Ad Hoc routing protocols like Ad hoc On-Demand Distance Vector
(AODV) Routing [8], Location Aided Routing (LAR) [6], in Ad Hoc Networks
(AOMDV) does not work well in wireless sensor networks because of limited
resources [6].

Many types of algorithm are used for the Clustering process. In wireless Ad hoc
Networks clustering can be differentiated into two types they are may be either
Deterministic or Randomized. Deterministic process can be used when the nodes
are selected by their weights and the result which leads to the cluster head. The
weight can be calculated by the number of neighbouring sensor nodes, mobility
rate, node Id [5]. In the process of randomized clustering approach algorithm the
nodes elects themselves randomly their cluster heads. The probability of the node
can be defined the secondary parameter. The probability functions can used to
control the number of nodes in the cluster and the cluster nodes.

3. PROBLEM STATEMENT

The Distributed Multi-Hop Clustering protocol is proposed the multiple channel


sequence generation algorithm which is used to increase the network lifetime.
The function of the algorithm is to detect the available channel of the device and
regularly check the interference in the group of the nodes. If any interference
detected it should alternate the channel. It can also avoid the frequent election of
the Cluster Head to improve the performance of sensor network. The cluster head

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 103


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
backup should be implemented in the proposed protocol. It is used when the
nodes finished the communication within their own cluster the cluster head
transmit the data to the specific base station [9]. Then the packet or data loss
should be avoided. It follow the time driven scenario which should be helps the
sensor nodes periodically sends a data to the Base Station. Each and every node
follows the Residual Energy Index [REI] and Node Degree Index [NDI]. The REI
used to calculate whether the energy should be sufficient to communicate with
the many number of users. It should actively monitor the energy levels in the
mobile nodes. It also used to avoid the collision which should lead to drop the
system level low. The clustering nodes are indicated by the labels. The cluster
head identifies by means of the labels should be given in the sensor nodes. The
labels are indicated by the (log n). It also maintains the routing address. The
routing tables generated by the nodes enable point to point routing. The DEMC
protocol is responsible for maintaining the labels in the routing table. The routing
table in the routing entry should be generated by the cluster nodes. The routing
entry should be refreshed periodically according to the transfer of the packets to
the destination.

The information should be maintained by the available channel in the required


time slot. The n number of users involved in the message passing. Suppose the
two user k1 and k2 should meet in the same time slot and exchange the
information which should be defined as t. And the information should not be
known to the third user k3 [1]. When the k2 and k3 meets and the information
about the two users should be exchanged in t+1 time slot. It should merge the
information about the three users.

Maintain the Channel


of the mobile nodes

Detect the interference


in the node

Interference
detected

Change the channel of


the node

Fig 2. Flow chart of the multiple channel sequence generation algorithm

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 104


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

4. MODULES

4.1 Channel Formation

In a network node is mainly designed to maximize the efficiency and


throughput of the communication. The node should be act as a connection point
or a redistribution point or the end point for the Data Transmission. It is used to
maintain the information of the signal strength, direction and information of the
neighbour node, direction ID, resources, location etc. In a network the function of
the node is to group as a network and formed as a cluster. The cluster of nodes
should be elects the cluster head and the cluster head maintains the information
about the nodes in the topology.

4.2 Cluster Head Election

The route should be maintained and the proper communication between


the neighbour nodes and the cluster head should be maintained periodically. The
probability of the node should be selected as the cluster head which the nodes are
uniformly distributed over the network. The Cluster Head is also a type of the
sensor node for organizing the other sensor nodes. The boundary node that
belongs to two clusters and should be act as a gateway between these sensor
nodes. It helps to provide the route and provide the way to communicate between
clustering nodes ant the process is known as inter cluster Communication. These
boundary sensor nodes help the cluster head when it does not work and does not
have long range capabilities. The overlapped Clusters which used to boost the
network robustness and used in the recovery process. The cluster head which
should be possess equal number of clusters nodes and it helps to provide the
balanced data processing and aggregation. The storage load should be mainly
reduced according to the size of the clusters. The cluster head advertise itself as
the cluster head to the other sensor nodes and the sensor nodes which once
receives the advertisement which joins the clusters. If it had already received the
advertisement message from other cluster it also considers the advertisement and
decides according to the communication range.

4.3 Inter and Intra Cluster Communication

The Inter cluster communication which may possess the End-to-End


Reliability, Message fragmentation and Multi point connection. The sender
gateway which helps to receives the message from the sender when the sender is
acknowledged. Thus the sender assumed that the packet reached the destination.
The sequence number indicates the temporal ordering of the message and the
acknowledgements. The proposed protocol can split the large message into the
smaller one by using the message transfer unit. Large message are fragmented by
means of the smaller one and transfer to the packet header.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 105


International Journal of Computer Science and Business Informatics

IJCSBI.ORG

FLOW DIAGRAM

Sink or Target node

Direct coverage Initialize Sub Sinks or


with sink RN (1...n)

Under
Channel selection
communication
By CGN time

According to
Cluster Head
group size cluster
Node
count differs

Members As per same


formations for features
every cluster

Data Normal nodes (1.. n)


Transfer

Fig 3. Overall process of Sending packets to Destination

The receiver has the protocol stack which is responsible for the reassembling of
these fragments before passing the information to the next clusters. Therefore
there is any packet loss should be occurs it should immediately transfer the
message and the packet which is retransmitted again from the sender. The
DEMC protocol maintains the routing table periodically. The Protocol stores the
possible routes from source to destination pairs and maintains the point-to-point
routing information for the medium sized clusters. It can also address all nodes at
least once to save the sensor nodes energy. It should also communicate in the
critical path and under complicated issues. A node need to communicate more
than one remote node it should use the different gateway for all the data flows.

4.4 Destination Throughput

On receiving the message the behaviour of the node depend on the other cluster
node. In the clustering process the current cluster head losses the energy and
force to resign and the highest node which could possess more energy and weight

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 106


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
will become the cluster head. Therefore the new link should be established. The
nodes with same levels of energy should be considered as the neighbour. It
should calculate the number of the level message and the transmission range. The
destination throughput should be generated according to the traffic patterns.

5. RESULT
The network lifetime should be evaluated by the mobile nodes. The proposed
protocol is performed over the different topologies should be represent the
network sizes and the random placing of the sensor nodes. The topology follows
the uniform distribution to avoid unnecessary travelling time. The proposed
protocol avoids the Unnecessary travelling time by minimizing the transmission
range between the nodes. It should be find the multiple routes should be finds out
by the mobile nodes in mobile collector and travel along the path with less hop
transmission. The average node degree should be calculated. The Cluster Head
probability (P) should be varied from 0.01 to 0.10. The Communication
Overhead should be increased by varying the range and for all the topology
changes the constant energy should be maintained by the sensor nodes. The
DEMC protocol mainly avoids the overlapping of the clusters and maximum
provides the equal sized clusters and constant energy to the mobile nodes. The
node death rate should be very low. Therefore the overall throughput should be
increased and avoid the transmission of the redundant information. The
performance and the bandwidth should high by using the multiple channels
sequence algorithm.

Fig 4 Establishment of connection between the neighbouring nodes

From fig (4) The Location of the mobile nodes should be changed periodically.
Neighbour nodes should be discovered by detect the mobile nodes in a
communication range. The information of the node should be updated regularly
to find the channel availability. The multiple numbers of nodes should find the
destination for the transfer of message [5]. So it should lead to avoid the packet

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 107


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
loss. The location information also changed according to the behaviour of the
mobile nodes.

Fig 5 Process of Cluster Head Election

The probability of the node P should elect the cluster head and the node which
should have the limited energy provides the communication to other node. The
proposed protocol should select the cluster head according to the area and size of
the network which should be measured by node degree index value. It results in
the lower delay and provides higher bandwidth in the network. In fig (5) from
this model result, we have a tendency to improved energy state and that we
reduced the energy consumption and the time delay should be reduced.

Fig 6 Efficient Routing formed by DEMC protocol

From fig (6) Nam window result we are able to see the method of our planned
model (data transmission, mobile collector movement). The efficient routing
formed by selecting the multiple channel at different time slots which should

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 108


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
used by different users. Many message passing protocols should not provide the
sequence guarantee message to the different users. The proposed protocol used
the multiple channel sequence generation algorithm to provide the periodic
messages to the available channels.

Fig 7 Lifetime of the message passing process

Therefore the sensor network should maintain the routing table and information
about the intermediate nodes. From fig (7) the multiple number of RREQ
message generated for the same channel should be discarded. The RREP message
which sends back to the source node and the same path should be followed by the
proposed protocol.

Fig 8 Bandwidth by the Proposed Protocol

The X graph (Fig 8) shows the bandwidth in the proposed system. The Y
coordinate should represent the energy levels measured in joules and the X

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 109


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
Coordinate represents the distance of the packet reaches the destination. The
lifetime of the packet should be increased should be shown in Fig 7.

6. CONCLUSION

In this paper, we have a tendency to study of mobile information about the


different users used the same channel and the merging of the information in the
same time slots. The proposed algorithm makes it possible that multiple
neighbouring users able to switch to the same channel [1]. The implementation of
multiple channel sequence generation algorithm gives tremendous result when
compared to the multiple switching algorithms. The Lifetime of the packet
information and bandwidth get increased and also provides the active
communication for longer time. The Auto Reconfiguration and Implementation
of the Central Coordinator should be taken for the future analysis of the problem.
The Central coordinator used to coordinate the intermediate nodes. If any
intermediate node fails it should check the configuration of the nodes and correct
the nodes and provides the way to the packet send to the destination without any
loss and average end to end delay should be reduced.

REFERENCES
[1] Batalin, M. A., Rahimi, M., Yu, Y., Liu, D., Kansal, A., Sukhatme, G. S., ... & Estrin, D.
(2004, November). Call and response: experiments in sampling the environment. In
Proceedings of the 2nd international conference on Embedded networked sensor systems (pp.
25-38). ACM.
[2] Di Felice, M., Chowdhury, K. R., Kim, W., Kassler, A., & Bononi, L. (2011). End-to-end
protocols for cognitive radio ad hoc networks: An evaluation study. Performance Evaluation,
68(9), 859-875.
[3] El-Moukaddem, F., Torng, E., Xing, G., & Xing, G. (2013). Mobile relay configuration in
data-intensive wireless sensor networks. Mobile Computing, IEEE Transactions on, 12(2),
261-273.
[4] Huang, X. L., Wang, G., Hu, F., & Kumar, S. (2011). Stability-capacity-adaptive routing for
high-mobility multihop cognitive radio networks. Vehicular Technology, IEEE Transactions
on, 60(6), 2714-2729.
[5] Jia, J., & Zhang, Q. (2013). Rendezvous Protocols Based on Message Passing in Cognitive
Radio Networks
[6] Kang, M. S., Chong, J. W., Hyun, H., Kim, S. M., Jung, B. H., & Sung, D. K. (2007,
February). Adaptive interference-aware multi-channel clustering algorithm in a ZigBee
network in the presence of WLAN interference. In Wireless Pervasive Computing, 2007.
ISWPC'07. 2nd International Symposium on. IEEE
[7] Kumar, D., Aseri, T. C., & Patel, R. B. (2011). A novel multihop energy efficient
heterogeneous clustered scheme for wireless sensor networks. Tamkang Journal of Science
and Engineering, 14(4), 359-368.
[8] Lin, Z., Liu, H., Chu, X., Leung, Y., & Stojmenovic, I. (2013). Constructing Connected-
Dominating-Set with Maximum Lifetime in Cognitive Radio Network.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 110


International Journal of Computer Science and Business Informatics

IJCSBI.ORG
[9] Liu, H., Lin, Z., Chu, X., & Leung, Y. W. (2012). Jump-stay rendezvous algorithm for
cognitive radio networks. Parallel and Distributed Systems, IEEE Transactions on, 23(10),
1867-1881
[10] Liu, Q., Pang, D., Hu, G., Wang, X., & Zhou, X. (2012, October). A neighbor cooperation
framework for time-efficient asynchronous channel hopping rendezvous in cognitive radio
networks. In Dynamic Spectrum Access Networks (DYSPAN), 2012 IEEE International
Symposium on (pp. 529-539). IEEE.
[11] Lo, B. F. (2011). A survey of common control channel design in cognitive radio networks.
Physical Communication, 4(1), 26-39.
[12] Park, S., Lee, E., Jin, M. S., & Kim, S. H. (2010). Novel strategy for data dissemination to
mobile sink groups in wireless sensor networks. Communications Letters, IEEE, 14(3), 202-
204
[13] Peng, C., Zheng, H., & Zhao, B. Y. (2006). Utilization and fairness in spectrum assignment
for opportunistic spectrum access. Mobile Networks and Applications, 11(4), 555-576
[14] Tian, H., Shen, H., & Roughan, M. (2008, December). Maximizing networking lifetime in
wireless sensor networks with regular topologies. In Parallel and Distributed Computing,
Applications and Technologies, 2008. PDCAT 2008. Ninth International Conference on (pp.
211-217). IEEE.
[15] Wu, X., Brown, K. N., & Sreenan, C. J. (2012, June). Data pre-forwarding for opportunistic
data collection in wireless sensor networks. In Networked Sensing Systems (INSS), 2012
Ninth International Conference on (pp. 1-8). IEEE
[16] Youssef, A. M., Younis, M. F., Youssef, M., & Agrawala, A. K. (2006, November).
Distributed Formation of Overlapping Multi-hop Clusters in Wireless Sensor Networks. In
GLOBECOM.
[17] Zhao, J., Zheng, H., & Yang, G. H. (2005, November). Distributed coordination in dynamic
spectrum allocation networks. In New Frontiers in Dynamic Spectrum Access Networks,
2005. DySPAN 2005. 2005 First IEEE International Symposium on (pp. 259-268). IEEE

This paper may be cited as:

Sundaram, V. S. and Kumar, S. J. K. J., 2014. An Improved Energy Efficient


Clustering Algorithm for Non Availability of Spectrum in Cognitive Radio Users.
International Journal of Computer Science and Business Informatics, Vol. 13,
No. 1, pp. 101-111.

ISSN: 1694-2108 | Vol. 13, No. 1. MAY 2014 111

You might also like