You are on page 1of 6

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),

ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 43-48 IAEME
43











DNA SEQUENCE ANALYSIS USING DISTRIBUTED COMPUTING WITH
SMITH-WATERMAN ALGORITHM


Shanu Verma
1
, Balwant Ram
2
, Bikramjit kaur
3


1, 3
Student of M.tech Computer Science, Department of CSE, Lovely Professional University,
Phagwara, Punjab, India
2
Assitant Professor, Department of CSE, Lovely Professional University, Phagwara, Punjab, India.




ABSTRACT

Mostly everything is done on the internet in todays world. There are number of jobs which
have to be performed on the internet. Some of them can be easily executed but some of them are
high computing jobs. They need high power computation to perform the task like jobs in image
processing, bioinformatics, finance extra. Another way of computing high computing jobs are
supercomputers but very expensive and out of reach from generality. Therefore performance of
high computing jobs are very difficult problem in these days. To compute these jobs, there is an
another method which I am going to propose, that is, if we split a job into small scale fragments
and perform these all fragments concurrently in a network on different nodes then this problem
will be solved. So, by using web services with distributed computing, I am going to examine
execution of high computing jobs and maximum job size needed for accurate results. SOAP is used
by web services for interacting over the network. I am going to use Sequence alignment as an
example of high computing job.

Keywords: Distributed Computing, Web Services, SOAP, WSDL, DNA Analysis, Sequence
Alignment.

1. INTRODUCTION

For high computing jobs, parallel computing is needed for reducing the computational time.
If we divide high computing job into small chunks and compute all that chunks separately and
simultaneously then computational time will get reduced. Distributed computing is one of special
type of parallel computing.

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)




ISSN 0976 6367(Print)
ISSN 0976 6375(Online)
Volume 5, Issue 5, May (2014), pp. 43-48
IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com

IJCET
I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 43-48 IAEME
44

1.1 Distributed computing
Distributed computing is a computing in which large job split into small scale fragments and
perform these all fragments concurrently in a network on different nodes concurrently to achieve a
single goal [1].

1.2 Architectural model
There are layers in this model that are: services, middleware and platform. Services are one
or more servers processes provide distributed service by interacting with one another where
middleware products and standards are widely used. They include CORBA, Java RMI, Web
services, DCOM. Platform is a combination of operating system and computer network layer.

1.3 System architecture
The division of responsibilities is done by the system architectures. There are two types of
architectures, namely, Client-server and Peer-to-peer. In client-server, one server may be clients of
another server. In peer-to-peer, all nodes play same role, can be client or server.

1.4 Challenges faced by distributed computing in early decades
Distributed computing faced the problem of how programming functionality encapsulated
within objects is exchanged in early decades. Over the past few decades, many techniques are used
to interchange programming functionality, including RPCs architecture such that CORBA and COM.
Platform neutral technique is specified by XML. Through XML and web technologies for accessing
computing functionality, that ability is known as web services.

1.5 Limitations of CORBA and DCOM
DCOM is a Microsofts-only architecture. CORBA is too complex and semantically
ambiguous to provide cross-platform interoperability without a large number of manual integration
works, in reality. RPC do specters of marshalling (serializing) the executable code and shipping it
over the internet then unsuspected problems of the security concerns and compatibility start arising.
Both CORBA and DCOM use binary wire protocols, which are not humanly reliable and have
difficulties to move through firewalls.

1.6 Web Services
Web Services solves above problems by HTTP with SOAP [2]. Web services are set of
standards which allow two computers to interact with each other and exchange data over the web.
HTTP is supported by all browsers and servers.

1.7 Sequence alignment
Sequence alignment is using the technique of modifying two compared sequences so that
patterns can match approximately with each other as closely as possible by inserting gaps [4].

2. RELATED WORK

Guan Qing and Guan Jianhe in 2012 focused on parallel computing on LAN which analyzes
the problems of electromagnetic field exploration application with OO technique using parallel
computer architecture and Message passing interface. To implement the message passing, Visual
C++.Net+MPICH2 which forms a parallel programming platform.
Zhang Feng et.al, explained in 2009 service oriented architecture that is agent based for
manufacturing enterprise collaborations. Services are platform-independent computational elements
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 43-48 IAEME
45

that can be described, published, discovered using XML for the purpose of developing distributed
interoperable applications.
CORBA allows interoperability between various components by providing their interfaces in
meta-language [3]. But according to above challenges web services are much better than CORBA.
Traditional distributed applications have importance as their own but the problem is to connect these
services together, to work for high computing. This problem can be resolved by web services {[5],
[6]}. To implement distributed computing, service-oriented architecture gives a framework. Web
services are becoming the base of distributed computing. If an application calls service
asynchronously then it doesnt need to wait for result. It can do another operation. When it need
result to do execution, then it can be stopped and fetch that result. This mechanism is important for
number of applications in distributed execution environment. By solving the issue of
interrelationship between the call and result, the module is capable of to execute asynchronous call of
web services. Working as a proxy for calling application, module is also capable to execute the call
dynamically {[7], [8]}. By using distributed computing with web services sequence alignment is
going to be analyzed. There are two algorithms for sequence alignment, which are: Needleman-
Wunsch algorithm and Smith-Waterman algorithm. Parallel strategy designed to explore the
computational characteristics of the Needleman-Wunsch algorithm that are used for biological
sequence comparisons. Needleman-Wunsch algorithm is used for global sequence alignment where
Smith-Waterman algorithm is used for local sequence alignment [9].

3. OBJECTIVES

Processing of huge amount of data or computation of high computing jobs is a hot issue in
these days. So I have following objectives: To analyze the factors affecting the performance of
distributed computing using web services and to find out the optimal job size that is given by master
node to the slave in distributed computing environment.

4. RESEARCH METHODOLOGY

Research methodology of the proposed work for the hypothesis is presented here. Problem of
computation of high computing job can be resolved by diving this job into small parts and then
process them in parallel. I am going to use distributed computing for analyze the performance of
high computing job.


Fig 4.1: Allocation of parts of job to the slaves by master in distributed computing

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 43-48 IAEME
46

Web services are included as middleware.

4.1 Web services
Web services provide a standard interoperability among various software applications which
are running on different platforms [10]. Web services are using XML language. XML stands for
eXtensible Markup Language. XML is a case sensitive language. XML must have root element.

4.2 SOAP
SOAP stands for Simple Object Access Protocol. SOAP is used for accessing web services
and it is XML based protocol. It is a format used for sending messages over the network. It is
language independent. By SOAP, applications can communicate over HTTP.

Format of SOAP: A SOAP message is an simple XML document containing the following
components:

An Envelope finds out the XML file.
A Header element has header details.
A Body element that contains detail about invocation and response.
A Fault element has detail about errors.

A SOAP message must use XML.

Syntax of SOAP Message: Syntax of SOAP message is given below:

<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Header>..
</soap:Header>
<soap:Body>.
<soap:Fault>..
</soap:Fault>
</soap:Body>
</soap:Envelope>

4.3 DNA analysis
DNA analysis is the process of analyzing the DNA sequence. Here sequence alignment is
going to be done. Alignment can be done in two ways, Local alignment and Global alignment {[11],
[12]}. In proposed work Local alignment is going to be used by using smith-waterman algorithm.

4.4 Smith-Waterman Algorithm
This algorithm performs local alignment on two sequences. This algorithm is applied when
both the sequences are dissimilar and expected to have some similarity [13]. This algorithm is
example of the dynamic programming and the aim of Smith-Waterman algorithm is to find the best
alignment over all the other sequence alignments.



International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 43-48 IAEME
47

Three steps are there in this algorithm which is following:

Initialization
Scoring
Trace back

In scoring scheme, Match Score = +1, Mismatch Score = -1, Gap penalty = -1 and
Substitution Matrix are given:

Table 4.4.1: Substitution matrix of smith-waterman algorithm
A C G T
A 1 -1 -1 -1
C -1 1 -1 -1
G -1 -1 1 -1
T -1 -1 -1 1

Initialization: In this step do following instructions:

Create a matrix with X +1 Rows and Y +1 Columns
The 1st row and the 1st column of the score matrix are filled with zeros.

Scoring: The score of any cell C (i, j) is the maximum of:

scorediag = C(i-1, j-1) + S(I, j)
scoreup = C(i-1, j) + g
scoreleft = C(i, j-1) + g

where S(I, j) is the substitution score for letters i and j, and g is the gap penalty [13].

Trace back: alignment is done in this step.

Trace back starts from the last cell, i.e. position X, Y in the matrix
Gives alignment in reverse order
There can be three moves: diagonally, up, or left
Trace back start from the cell having maximum value.
The only possible predecessor is the diagonal match/mismatch neighbor. If more than one
possible predecessor exists, any can be chosen [13]. This gives us a current alignment of
both sequences.

5. CONCLUSION AND FUTURE WORK

The time consumption and memory consumption can be reduced by distributed computing.
Therefore, purposed work is to examine execution of high computing jobs using web services and
maximum job size needed for accurate results which will pass over the network. By applying web
services, interaction will get easy over all the nodes in the network because of HTTP and SOAP.
Hardware and expenditure will also decrease. There are number of fields where we have to compute
high computing jobs, there we can use distributed computing with web services. In this work,
sequence alignment is use as high computing job. So this is beneficial for future scope.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 43-48 IAEME
48

6. REFERENCES

[1] Guan Qing and Guan Jianhe (2012) The Realization of Parallel Computing in LAN, IEEE, 2012
2
nd
International Conference on Computer Communication and Network Technology,
Changchun, China, pp. 950-953.
[2] Zhang Feng, Chen Xin and Wei Yongshan (2009) A Distributed Data Integration Framework
Based on Web Services and LDAP, IEEE, 2009 International Forum on Computer Science-
Technology and Applications, Qingdao 266510, China, pp. 257-259.
[3] Keahey Katarzyna and Gannon Dennis PARDIS: A Parallel Approach to CORBA *,
Department of Computer Science, Indiana University, 215 Lindley Hall, Bloomington, pp. 31-39.
[4] Gunturu Sudha, Li Xiaolin, and Tianruo Yang Laurence (2009) Load Scheduling Strategies for
Parallel DNA Sequencing Applications, IEEE, 11th IEEE International Conference on High
Performance Computing and Communications, USA and Canada, pp. 124-131.
[5] Wang Lijuan, Shen Jun, Di Changyan, Li Yan, Zhou Qingguo (2013) Towards minimizing cost
for composite data-intensive services, IEEE, 2013 17th International Conference on Computer
Supported Cooperative Work in Design, Australia and P. R. China, pp. 293-298.
[6] Tretola Giancarlo and Zimeo Eugenio (2007) Client-Side Implementation of Dynamic
Asynchronous Invocations for Web Services, IEEE Italian Ministry of Research and Education
(MIUR).
[7] Weiming Shen, Hamada Ghenniwa and Yinsheng Li (2006) Agent-Based Service-Oriented
Computing and Applications, 2006 1st International Symposium on Pervasive Computing and
Applications, Shanghai, China, pp. 8-9.
[8] Otieno Jim and Vijaya Selvi Rajan Amala Leveraging Traditional Distributed Applications to
Web Service for E-Learning Applications, IEEE, Proceedings of the 15th International
Workshop on Database and Expert Systems Applications (DEXA04).
[9] Al Junid, S. A. M.; Haron, M.A.; Abd Majid, Z.; Halim, A.K.; Osman, F.N.; Hashim, H. (2009)
Development of Novel Data Compression Technique for Accelerate DNA Sequence Alignment
Based on SmithWaterman Algorithm, IEEE, 2009 Third UKSim European Symposium on
Computer Modeling and Simulation, University Technology MARA (UiTM), pp. 181-186.
[10] Schmelzer R. and Unleashed (2008) XML and Web services, India, p. 23-687.
[11] Duds L. (2006) Improved Pattern Matching to Find DNA Patterns, IEEE, Department of
Information Engineering University of Miskolc, Hungary.
[12] Teresa K., David J. and Samiron P. (2007) Introduction to Bioinformatics, India, p. 125-128.
[13] http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&ved=0C
FAQFjAE&url=http%3A%2F%2Fnabg.iasri.res.in%2Fimages%2FPresentation%2FSEQUENCE
%2520ANALYSIS-%2520I.ppt&ei=I_igUuGyE8G_rgf-lYCoDQ&usg=AFQjCNH-
dgNDz0cGnz6oaXToY3y 0142mzg&bvm=bv.57155469,d.bmk.
[14] Vinod Kumar Yadav, Indrajeet Gupta, Brijesh Pandey and Sandeep Kumar Yadav, Overlapped
Clustering Approach for Maximizing the Service Reliability of Heterogeneous Distributed
Computing Systems, International Journal of Computer Engineering & Technology (IJCET),
Volume 4, Issue 4, 2013, pp. 31 - 44, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
[15] Houda El Bouhissi, Mimoun Malki and Djamila Berramdane, Applying Semantic Web Services,
International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013,
pp. 108 - 113, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
[16] A. Suganthy, G.S.Sumithra, J.Hindusha, A.Gayathri and S.Girija, Semantic Web Services And
Its Challenges, International Journal of Computer Engineering & Technology (IJCET), Volume
1, Issue 2, 2010, pp. 26 - 37, ISSN Print: 0976 6367, ISSN Online: 0976 6375.
[17] Shaymaa Mohammed Jawad Kadhim and Dr. Shashank Joshi, Agent Based Web Service
Communicating Different Iss And Platforms, International Journal of Computer Engineering &
Technology (IJCET), Volume 4, Issue 5, 2013, pp. 9 - 14, ISSN Print: 0976 6367, ISSN Online:
0976 6375.

You might also like