You are on page 1of 4

paper topics

Parallelization of star alignment
Parallel H4MSA for multiple sequence alignment
A Novel Approach to Multiple Sequence Alignment using Hadoop Data Grids
Parallelized genomic sequencing model: a big data approach for bioinformatics ap
plication
Massively parallel algorithm for multiple biological sequences alignment
MRSMRS: Mining Retitive Sequences in a MapReduce Setting
Alignment-Free Sequence Comparison over Hadoop for Computational Biology
A parallel algorithm for the best K mismatches alignment problem
HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Sta
r Strategy
Progressive alignment method using genetic algorithm for multiple sequence align
ment
Implementation of parallel protein structure alignment service on cloud
An Enhanced Framework of Genomics Using Big Data Computing
A private cloud system for web based high performance multiple sequence alignmen
t services
Genomic analysis with mapreduce
Blast parallel: the parallelizing implementation of sequence alignment algorithm
s based on hadoop platform
Pairwise sequence alignment method for distributed shared memory systems
A steady state genetic algorithm for multiple sequence alignment
Mapreduce based parallel suffix tree construction for human genome
Phylogenetic analysis using mapreduce programming model
A Novel Structure of the Smith-Waterman Algorithm for Efficient Sequence Alignme
nt
Bwasw- cloud: Efficient sequence alignment algorithm for big data with mapreduce
Parallel A-Star Multiple Sequence Alignment with Locality-Sensitive Hash Functio
ns
Optimizing Sequence Alignment in Cloud using Hadoop and MPP Database
Parallelization of BLAST with MapReduce for Long Sequence Alignment
An algorithm of multiple sequence alignment based on consensus sequence searched
by simulated annealing and star alignment
Big data: cloud computing in genomics applications

thesis topics

Parallelization of star alignment algorithm for multiple sequence alignment usin
g hadoop data grids
Parallelization of star alignment algorithm for multiple sequence alignment usin
g MapReduce Model
A MapReduce Model of star alignment algorithm for multiple sequence alignmnet.

Introduction
Computer science is applicable across many domains in order to solve the computa
tional problem such as in biological
sciences. It can be used to perform pair-wise sequence
alignment. The pairwaise sequence alignment technique aims
to identify regions of similarity between two DNA sequences
to analyze the functional, structural, or evolutionary
relationships between sequences. One method for pair-wise
sequence alignment is Needleman-Wunsch that defines the
way of finding the best global sequence of two sequences using
dynamic programming. This method has the complexity of

called Star Alignment. it is very important to get a better way to improve the performance. Similarly. Our proposed method is implemented using Message Passing Interface (MPI). multiple sequence alignment that processes the sequences one by one. they have a timing issue problem while processing the data. Consequently. In this method. . Older method such as Needleman-Wunchs has extremely high computational complexit y of O(n2). S2-Sk. To reduce the execution time signific antly.O(n2). One of the methods for increasing the speed is parallel computation by using multiple computers work together as a system. In this research. This research focuses on finding a faster method to process multiple sequence al ignment using the Star method with a parallel computer. Therefore. there are many tools and techniques that provide analysis of sequence alignment and alignment products to understand molecular biology such as predicting secondary protein via the use of multiple sequence alignment. Currently. Sk-1-Sk one by one in which they will be pair-wise alignment. we proposed of using Star Alignment method to perform multiple sequence alignment. However. There is an extraordinary number of data sequences when they are compared. the problem is devided into smaller parts and use smaller solution to build a larger solution. and vice versa. â ¦. S3-Sk. Sequence alignment can be used to determine the function of genes and proteins by comparing the similarities of the sequences in all that was studied. takes time until O(k2n2). the computation result still h as high accuracy. The re sults show that the paralellization of the Star Alignment increased speed up 4-6 times compared to that of using single CPU. we requires to modify the Star Alignment algorithm by implementing parallel programming using Map Reduce model of Hadoop. Probl ems when comparing the huge data sequences are accuracy and efficiency. we compare sequences S1-Sk. These parameters are contradiction which means reaching faster speed will decrease accuracy. the complexity of Star alignment algorithm is quite high which is O(k2 n2). Problem definition Pair-wise sequence alignment is a technique of comparing the similarity of two o rganisms. paralellization methodology of Map and Reduce framework. It is the basic technique in DNA sequence alignment. In the Star Alignment method. In simple terms.

you can extract profiles to use them against databases. As Multiple Sequence Alignments are playing a major role inBioinformatics. you c an use it almost anywhere but as every thing on this earth. Protein f amily prediction. The scope of this thesis work is to develop a MapReduce model of star alignment algorithm for multiple sequence alignment. some times it helps even with the 3D structure. you can ide ntify which region is responsible for a functional site. The main applications of sequence alignments have included phylogenetic tree reconstruction. pattern identification. 2.DNA Regulatory Elements: You can use Multiple Sequence Alignments to locate D NA regulatory elements such as binding sites.Pattern Identification: By looking at conserved regions or sites. 3.Domain Identification: By looking at file provided by a Multiple Sequence Ali gnment. . nothing is perfect or 100% accurate. The taxa joined together in the tree are implied to have descended from a common ancestor. 6..Phylogenetic Analysis: By carefully picking related sequences you can reconst ruct a tree using sequences that u have used in the Multiple Sequence Alignment.etc. A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entitiesâ their phylogenyâ based upon similarities and differences in their physical or genetic chara cteristics.. so u have to choose your sequences very car efully to prevent meaningless results.scope of work the main applications of sequence alignments have included phylogenetic tree rec onstruction. 5. 1.Protein Family: a Multiple Sequence Alignment can help you to decide that you r protein is a member of a known protein family or not. 4.Structure Prediction: a Multiple Sequence Alignment can give you the almost p erfect protein or RNA secondary structure.