(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, 2011
given by gap = n×go, where go < 0 is the opening penaltyof a gap and n is the number of consecutive gaps.
Affine gap model – in this model both the new gap andextension gap are not given the same penalty. Theinsertion of a new gap has a greater penalty than theextension of an existing gap and is given by gap = go + (n
1) × ge, where go < 0 is the gap opening penalty and ge< 0 is the gap extension penalty and are such that |ge| <|go|.
The MSA objective function is defined for assessing thealignment quality either explicitly or implicitly. An efficientalgorithm is used to find the optimal or a near optimalalignment according to the objective function. Matches,mismatches, substitutions, insertions, and deletions need to bescored in the scoring function. The scoring function can bedivided into two parts: substitution matrices and gap penalties.The former provides a numerical score for matches andmismatches while the latter allows for numerical quantificationof insertions and deletions. All possible transitions between the20 amino acids, or the 4 nucleic acids are represented in asubstitution matrix which is an array of two dimensions of 20 x20 for amino acid and 4 x 4 for nucleic acids.Usually a simple matrix used for DNA or RNA sequencesinvolves assigning a positive value for a match and a negativevalue for a mismatch. Meanwhile, the scores for proteinaligned residues are given as log-odds substitution matricessuch as PAM, GONNET, or BLOSUM.There are several models for assessing the score of a givenMSA. Many MSA tools have adopted the score method. Abrief review of the score method that has been used to calculatethe alignment score is as follows:
Sum-of-Pairs (SP): It was introduced by Carrillo andLipman. More details about the sum-of-Pairs will bepresented later.
Weighted sum-of-pairs score,: The weighted sum-of-pairs (WSP) score is an extension of the SP score sothat each pair-wise alignment score contributes differentlyto the whole score.
Maximal expected accuracy (MEA): The basic idea of MEA is to maximize the expected number of “correctly”aligned residue pairs. It has been used in PRIME,and ProbCons algorithms.
Consistency-based Scoring: This consistency concept wasoriginally introduced by Gotoh  and later refined byVingron and Argos. Consistency-based scoring is usedin T-Coffee, MAFFT, and Align-malgorithms.
Probabilistic consistency Scoring function: This scoringfunction is introduced in ProbCons. It is a novelmodification of the traditional sum-of-pairs scoringsystem. This promising idea is implemented and extendedin the PECAN, MUMMALS, PROMALS,ProbAlign , ProDA, and PicXAA programs.
Segment-to-segment objective function: It is used byDIALIGN to construct an alignment throughcomparison of the whole segments of the sequences rather than the residue-to-residue comparison.
NorMD objective function: It is a conservation-basedscore which measures the mean distance between thesimilarities of the residue pairs at each alignment column.NorMD is used in RASCAL and AQUA.
Muscle profile scoring function: MUSCLE uses ascoring function which is defined for a pair of profilepositions. In addition to PSP, MUSCLE uses a new profilefunction which is called the log-expectation (LE) score.
RNA Database and Benchmarks
Typically, a benchmark of reference alignments is used tovalidate the MSA program. The accurate score is given bycomparing the aligned sequence (test sequences) produced bythe program with the corresponding reference alignment. Mostalignment programs have been extensively investigated for protein. To date, few attempts have been made to benchmark nucleic acid sequences.RNA reference alignments exist in several databases. Itmust be noted that although these databases provide asubstantial amount of information to the specialist, they dodiffer in the file formats used and the data obtained. Herein, abrief review of the benchmarks and database that have beenused for multiple RNA sequence alignment is explained inTable 1.
TABLE I. D
RNA Database Description Website
 It is a compilation of alignment and covariance models including manyregular non-coding RNA familieshttp://rfam.sanger.ac.uk/http://rfam.janelia.org/index.html.BRAliBase
 It is a compilation of RNA reference alignments especially designed for thebenchmark of RNA alignment methods.http://www.biophys.uni-duesseldorf.de/bralibase/http://projects.binf.ku.dk/pgardner/bralibase/Comparative RNA Website(CRW)It has alignments for rRNA (5S / 16S / 23S), Group I Intron, Group IIintron, and tRNA for various organismshttp://www.rna.ccbb.utexas.edu/European Ribosomal RNADatabase
It is a collection of all complete or nearly complete SSU (small subunit) andLSU (large subunit) ribosomal RNA sequences available from publicsequence databases.http://bioinformatics.psb.ugent.be/webtools/rRNA/The Ribonuclease PDatabaseIt contains a collection of sequence alignments, RNase P sequences, threedimensional models, secondary structures, and accessory information.http://www.mbio.ncsu.edu/RnaseP/5S Ribosomal RNA It is a collection of the large subunit of most organellar ribosomes and all http://biobases.ibch.poznan.pl/5SData/
72 http://sites.google.com/site/ijcsis/ISSN 1947-5500