Professional Documents
Culture Documents
Lab Report - Bioinromatics
Lab Report - Bioinromatics
EXPERIMENT – 06
ABSTRACT
2
initial homology, it is important to favor methods that reduce non-homology. Furthermore,
the one that follows the phylogenetic optimality criteria, of all possible alignments, should be
called the best alignment. Consistency of optimality principles is desirable, as all homology
statements become subject to analysis and clarification in this manner. This accuracy is based
on the evaluation, by study from alignment to phylogeny reconstruction, of alignment gaps as
character details and the effective use of a cost function (e.g., insertion-deletion,
Transversion, and transition). For molecular and evolutionary biology, multiple sequence
alignment (MSA) is an incredibly valuable tool and there are many programs and algorithms
available for that kind of function. While the alignment precision of different MSA
programmes has been compared in previous research, their computing time and memory use
have not been consistently evaluated. Given the unparalleled quantities of information
generated by deep sequencing systems of the next generation and the increasing need for
large-scale data analysis, optimizing the application of software is imperative. However, the
most effective MSA software has become a vital predictor of a compromise between
alignment precision and computational cost. We contrasted the accuracy and cost of nine
common MSA programs, namely CLUSTALW, CLUSTAL OMEGA, DIALIGN-TX,
MAFFT, MUSCLE, POA, Probalign, Probcons and T-Coffee, against the BAliBASE
benchmark alignment dataset, and discussed the importance of such implementations
embedded in the algorithm of each software.
INTRODUCTION
The alignment of three or more biological sequences (protein or nucleic acid) of identical
length is usually multiple sequence alignment (MSA). Homology and the evolutionary
relations between the sequences analyzed can be inferred from the output. Pairwise Sequence
3
Alignment methods, on the other hand, are used to classify similarity regions that may
suggest functional, structural and/or evolutionary associations between two biological
sequences.
To build a phylogenetic tree, multiple sequence alignments can also be used. A phylogenetic
tree or evolutionary tree is a branching diagram or "tree" displaying, based on similarities and
variations in their physical or genetic traits, the inferred evolutionary relationships between
different biological organisms or other individuals, their phylogeny. It is implied that the taxa
united in the tree have arisen from a common ancestor. In phylogenetics, phylogenetic trees
are fundamental to the field. Every node with relatives in a rooted phylogenetic tree
represents the most recent inferred common ancestor of the descendants, and the edge lengths
can be viewed as time estimates in certain trees. A taxonomic unit is called a node. Internal
nodes, since they cannot be clearly observed, are usually considered hypothetical taxonomic
groups. In fields of biology, such as bioinformatics, systematics, and phylogenetic
comparative approaches, trees are useful (Clustal W2 Introduction, 2020).
ClustalW is perhaps the most common multiple sequence alignment algorithm, integrating a
variety of commercially available bioinformatics packages, such as DNASTAR, into a so-
called black box, whereas the recently developed Clustal Omega algorithm is currently the
most precise and scalable MSA algorithm available. The current MSA algorithm from the
Clustal family is Clustal Omega. To match protein sequences only this algorithm is used to
(though nucleotide sequences are likely to be introduced in time). The precision of Clustal
Omega is similar to other high-quality aligners on small numbers of sequences; however,
Clustal Omega outperforms certain MSA algorithms in terms of completion time and overall
alignment quality on large sequence ranges (Bioinformatics Tools for Multiple Sequence
Alignment < EMBL-EBI, 2020).
4
Figure 1_Clustal Omega Algorithm
OBJECTIVES
5
To identify conserved sequence regions using multiple sequence alignment
CLUSTAL OMEGA and determine the evolutionary relationships between sequences
from phylogenetic trees.
MATERIALS
1. Computer
2. Internet connection
3. CLUSTAL OMEGA program
METHODS
6
1. The file DHFR.txt was downloaded under the link.
http://theory.bio.uu.nl/BPA/Data/DHFR.txt.
This file contains, in FASTA format, the protein sequences of dihydrofolate reducatse
from chicken, human, Pneumocystis (a fungus) and Pseudomonas (a bacterium).
2. Then using the following link, the CLUSTAL OMEGA server (at EBI) homepage
was accessed.
https://www.ebi.ac.uk/Tools/msa/clustalo/
7
Figure 3_The CLUSTAL OMEGA server (at EBI) homepage
3. After that, inserted the contents of the DHFR.fasta file in the query box.
Figure
4. The database was set at4_The CLUSTAL
its default and OMEGA serverwas
submit button homepage
clicked.
8
Figure 5_The CLUSTAL OMEGA server homepage
Sequence alignment and links to numerous other files were contained on that results
page.
9
Figure 6_ the sequence alignment
10
Figure 8_ Meaning of the colors on protein alignments
7. The phylogenetic tree results were displayed as shown below. CLUSTAL OMEGA
will show a Neighbor-joining tree without distance corrections.
11
Figure 10_The phylogenetic tree - Real
DISCUSSION
12
Multiple sequence alignments (MSA) in molecular biology, computational biology, and
bioinformatics are an important and commonly used computational technique for biological
sequence analysis. In order to carry out phylogenetic reconstruction, protein secondary and
tertiary structure analysis, and protein function prediction analysis, MSA is completed when
homologous sequences are compared. In this experiment, we were identified conserved
hierarchical zones using a multi-hierarchical alignment and determined the evolutionary
relationship between CLUSTAL OMEGA and the phylogenetic sequences of phylogenetic
trees (Clustal W2 Introduction, 2020).
Figure 6
The alignment of the chicken and human sequences to each other is evident in this
capture, mainly to match the sequence with the longer Pneumocystis. The alignment
of vertebrates, but where all sequences align, there is a distinct 'island' indicating the
existence of retained motifs and showing that the alignment is biologically important.
Figure 7 and 8
13
The colors of the alignment are presented in Figure 7. The residues are colored by
CLUSTAL OMEGA according to their physiochemical properties. The significance
of the colors on protein alignments is shown in Figure 8.
Figure 9 and 10
Both of the figures show the phylogenetic tree. Figure 9 is the cladogram and figure
10 is real.
REFERENCES
14
1. Abacus.bates.edu. 2020. Clustal W2 Introduction. [online] Available at:
<https://abacus.bates.edu/bioinformatics1/clustalw.html> [Accessed 29 November
2020].
3. Ebi.ac.uk. 2020. Bioinformatics Tools For Multiple Sequence Alignment < EMBL-
EBI. [online] Available at: <https://www.ebi.ac.uk/Tools/msa/> [Accessed 27
November 2020].
4. Ebi.ac.uk. 2020. Bioinformatics Tools For Multiple Sequence Alignment < EMBL-
EBI. [online] Available at: <https://www.ebi.ac.uk/Tools/msa/> [Accessed 27
November 2020].
POST-LAB QUESTIONS
15
ACC Oxidase catalyzes the last step in the biosynthesis of the plant hormone ethylene.
Indeed, this simple gas molecule is involved at several stages of plant growth (Including
germination and senescence) and controls fruit ripening.
Search for ACC oxidase protein sequences in 5 different plants and get the FASTA files. Do
a multiple sequence alignment using CLUSTAL OMEGA and generate tree form the multiple
sequence alignment, the accession numbers (AC) of those sequences are:
Show the FASTA sequences from 5 different plants, multiple sequence alignment results and
phylogenetic tree in the lab report.
16
Figure 12_FASTA sequence of Musa acuminate
17
Figure 15_FASTA sequence of Carica papaya
STEP 02
18
Figure 17_Multiple sequence alignment results
PHYLOGENETIC TREE
19
Figure 19- The phylogenetic tree - Cladogram
Pro P Proline
Leu L Leucine
Arg R Arginine
Met M Methionine
Cys C Cysteine
Trp W Tryptophan
Gly G Glycine
Phe F Phenylalanine
His H Histidine
Val V Valine
Tyr Y Tyrosine
Asp D Aspartic acid
Glu E Glutamic acid
Asn N Asparagine
Gln Q Glutamine
20
Phe(P)-103
GIycine(G)-158
Arg(R)-177
Note: Please refer to the first sequence number in the MSA for the position of the above
amino acids
Phe (P)-103
GIycine (G)-158
Arg (R)-177
b) Based on your observation from question 3(a), is this software produce the best
multiple sequence alignment? Justify your answer.
21
Above residue has critical role in ACC oxidase function. Proline (P) and Arginine
(R) have asterisk, that means positions which gave single, fully conserved
residue. Therefore, this software implies that those residues are important.
However, Glycine is not fully conserved residue. Out of three there are two fully
conserved residues therefore, we can say that software produce the best multiple
sequence alignment.
4. From your observation on the phylogenetic tree, give description about the
relatedness of this protein between the 5 different plants. Use the following term in
your description:
Clade
Out group
Sister group
Evolutionary trees depict clades. A clade is a group of organisms that includes an ancestor
and all descendants of that ancestor. You can think of a clade as a branch on the tree of life.
Musa acuminata and Hordeum vulgar are sister groups, and Trifolium repens and Carica
papaya are sister groups, meaning that they share the closest evolutionary relationship
because they stem from the same common ancestor. Zea mays is the out group in this tree,
taxon outside of the common ancestor are referred to as out groups as they are more
evolutionarily distant in relation than sister taxa are to one another due to a more distance
common ancestor. With each successive speciation event, a new clade is formed within the
tree, allowing scientists to identify common ancestors between evolutionarily distant taxon.
22