You are on page 1of 3

Journal of Medicinal Plants Research Vol. 5(32), pp.

6931-6933, 30 December, 2011


Available online at http://www.academicjournals.org/JMPR
ISSN 1996-0875 ©2011 Academic Journals
DOI: 10.5897/JMPR11.363

Review

Fast alignment (FASTA): A review article


M. Akram1*, M. Saim Jamil1, Zahid Mehmood2, Muhammad Akram5,
Muhammad Khurram Waqas4, Zafar Iqbal3, Aubid Allah Khan3 and H. M. Asif3
1
Shifa ul Mulk Memorial Hospital, Hamdard University, Karachi, Pakistan.
2
Department of Chemistry and Biochemistry, University of Agriculture Faisalabad-38040, Pakistan.
3
Department of Pharmacy, The Islamia University of Bahawalpur, Pakistan.
4
Department of Pharmacy, The University of Faisalabad, Pakistan.
5
Department of Pharmacy, University of Sargodha, Pakistan.
Accepted 4 October, 2011

Fast alignment (FASTA) is used to compare protein sequence to another protein sequence. FASTA
provides similarity between nucleotide and protein databases using the FASTA program. FASTA is
usually used as part of a package of programs and it construct local and global sequence alignments.
In this article, simple applications of FASTA were discussed.

Key words: Fast alignment (FASTA), program, sequence similarity.

INTRODUCTION

Fast alignment (FASTA) is a program that is utilized to global alignment purposes (Henikoff et al., 1992, Brenner
analyze the similarity between protein sequences as well et al., 1998). The present study focuses on the steps
as nucleic acids. It also searches protein and DNA required to run the programs, rather than on the
databases and finds similarities between them. This interpretation of the results of a FASTA search.
program is used to find the region of similarity, local as
well as global. FASTA is both fast and selective because
it initially considers only amino acid identities.This FASTA
program may help in understanding evolutionary
relationships between sequences as well as help identify Sequence similarity search using the FASTA
members of gene families (Harison, 2005). The FASTA program
program can search the NBRF (National Biomedical
Research Foundation) protein sequence library (2.5 The uses of FASTA in databases include protein,
million residues) in less than 20 min on an IBM-PC nucleotide, proteomes, genomes, whole genome
(International Business Machines Corporation Personal shotgun, ASD (Alternative splicing database) protein A,
Computer) microcomputer and unambiguously detect SD (Splicing database) nucleotide, LGIC (Ligand-Gated
proteins that shared a common ancestor billions of years Ion Channel) protein and LGIC nucleotide.
in the past (Mackey et al., 2002, Pearson, 1990).
Computers are commonly used for analyses of DNA and FASTA program
protein sequence data. A common application of
computers in molecular biology is to characterize newly FASTA-protein similarity search
determined sequences by searching for DNA and protein
sequence databases. FASTA is widely used for such type This tool provides sequence similarity searching for
of searches, because it is fast, sensitive, and readily protein databases using the FASTA program (Pearson et
available (Table 1). FASTA is used for local as well as al., 1998). The steps to follow are:

1. Protein sequence is obtained from Uniprot


knowledgebase,
*Corresponding author. E-mail: makram_0451@hotmail.com. 2. Sequence is uploaded in FASTA soft ware,
Tel: 92 021 6440083. Fax: 92 021 6440079. 3. Parameters for analysis are set,
6932 J. Med. Plants Res.

Table 1. File extension

S/N Extension Meaning


1 Fasta Generic fasta
2 Fna Fasta nucleic acid
3 Ffn FASTA nucleotide coding regions
4 Faa Fasta amino acid
5 Frn FASTA non coding RNA

4. Submit option is clicked, DISCUSSION


5. Results are obtained after few minutes.
FASTA is used for rapid alignment of pairs of protein and
DNA sequences. This program searches for matching
Protein and DNA/RNA databases sequence patterns or words, called k-topples. These
patterns comprise k consecutive matches of letters in
Numerous databases of protein have been developed. both sequences (Juke, 1998). The program attempts to
Swissprot is an example. build a local alignment based on these word matches
(Altschul and Gash, 1996). Due to the ability of the
algorithm to find matching sequences in a sequence
database with high speed, FASTA is useful for routine
FASTA-based compilation of higher plant database searches of this type. Other programs like
mitochondrial tRNA genes
BLAST (Basic local alignment sequence tool) is faster
than FASTA, and is of comparable sensitivity for protein
FASTA program for similarity searching in nucleotide queries; and also does DNA searches. Programs that use
sequence databases has been developed by means of the Smith-Waterman dynamic programming algorithm for
the FASTA program. New version of the compilation of protein and DNA searches are slower but more sensitive
higher plant mitochondrial tRNA genes has been when full-length protein sequences are used as queries
obtained. Previous collection has been improved upon. (Lipan et al., 1984).
158 sequences with an increase of 43 units are in current
compilation (Sagliano et al., 1998).

CONCLUSION
BIOFFORC (Biological file format conversion) tool
development for biological file format FASTA is commonly used for comparing sequences of
protein and DNA. It gives fast and reliable results. FASTA
Different sequence formats are used in bioinformatics. provides an estimate of the statistical significance of each
Specific sets of bioinformatics are used for processing. alignment found.
Sometimes, sequence format conversion is needed. In
the public domain there are many sequence conversion
tool present. For this purpose a file format converter has REFERENCES
been developed with a graphical user interface in PERL
Brenner SE, Chothia C, Hubbard TJ (1998). Assessing sequence
(Practical Extraction and Report Language). This file comparison methods with reliable structurally identified distant
format converter is called BIOFFORC (Chinnaiah et al., evolutionary relationships. Proc. Natl. Acad. Sci., 95: 6073-6078
2008). Chinnaiah S, Maruthamuthu R, Ekambaram R (2008). BIOFFORC: Tool
development for biological file format, Bioinform., 3(2): 98-99
Harison S (2005). FASTA, Fundamentals of Bioinformatics, I.K.
International publishing house pvt.Ltd. New Delhi. India, p. 76.
Using the FASTA program to search protein and DNA Henikoff S, Henikoff JG (1992). Amino acid substitution matrices from
sequence databases protein blocks. Proc. Natl. Acad. Sci. U.S.A., 89: 10915-10919.
Altschul SF, Gash W (1996). Local alignment statistics. Methods
Enzymol., 266: 460-480.
Computers are commonly used for analysis of DNA and Mackey AJ, Haystead TA, Pearson WR (2002). Getting more from less:
protein sequence data. Newly determined sequences are algorithms for rapid protein identification with multiple short peptide
characterized by searching DNA and protein sequence sequences. J. Molecul. Cellul. Proteom., 1(2): 139-147.
Pearson WR (1990). Rapid and Sensitive Sequence Comparison with
databases. The FASTA program is commonly used for FASTP and FASTA. Methods in Enzymol., 183: 63-98.
such searches due to its fastness and sensitivity. Steps Pearson WR, Lipman DJ (1998). Improved tools for biological sequence
to run the FASTA programs have been developed comparison. Proc. Natl., Acad Sci. U. S. A., 85(8): 2444-1448.
Akram et al. 6933

Sagliano A, Volpicella M, Gallerani R, Ceci1 LR (1998). FastA based William R (1994). Using the FASTA Program to Search Protein and
compilation of higher plant mitochondrial tRNA genes. Nucl. Acids DNA Sequence Databases, Meth. Mol. Biol., 24(9): 307-331.
Res., 26(1): 154-155

You might also like