You are on page 1of 14

BI205: GENETICS & EVOLUTION

BIOINFORMATICS 1 & 2

Name: Nau Fa’u


Id: s11176348
1.0 INTRODUCTION
The Basic Local Alignment Search Tool (BLAST) purposely to determine the regions of local
relationship among protein or nucleotide sequences. The program compares nucleotide or
protein sequences to sequence in a database and calculates the statistical significance of the
matches hence one of the National Center for Biotechnology Information (NCBI) databases and
servers in which NCBI enables researchers and scientist by providing access to biomedical and
genomics[ CITATION Bha07 \l 1033 ].
Though, many biotech companies, and bioinformatics personnel would prefer to use “stand-
alone” BLAST to question their own, local databases or need to convert BLAST to make it better
suit their needs. Standalone BLAST are in two forms: the practicable that can be run from the
command line; or the 4Standalone WWW BLAST Server, which allows users to set up their own
in-house versions of the BLAST Web pages. BLAST can be both nucleotide or protein, the
nucleotide BLAST discusses to the use of a member of the BLAST suite of programs, such as
“blastn” to search with a nucleotide “query” against a database of nucleotide “subject”
sequences, whereas the protein BLAST refers to the Protein-to-protein sequence searches are
performed using the original member of the BLAST suite of programs, known as “blastp.”
Sequence alignment is the procedure of associating and distinguishing similarities between
biological sequences. What “similarities” are being identified will depend on the aims of the
specific alignment process. Sequence alignment acts to be very useful in a number of
bioinformatics applications. Thus, is a necessary requirement for a wide variety of analyses, that
can be carried out on unknown sequences. Any examination that includes the simultaneous
treatment of a number of homologous proteins will typically need that the proteins have been
lined up with the homologous residues in columns[ CITATION BI220 \l 1033 ].

Deliberately, the objectives carried out in the first practical is to study about the National
Center for Biotechnology Information, evaluate and interpret the outcomes from the BLAST
searches, and to determine the uses set of primers for amplifying a specific gene region of
interest. It is then look at the SNPs in HIV-1 segregates from a patient assumed to confer drug
resistance, then identified homologous and the best matches obtain in the nr data base for
sequences found in the Pacific countries research. Thus, made a pairwise BLAST comparison
sequences and drawn conclusions about the genetic diversity of these unknown species. The
aims set for practical 2, is to gain more knowledge about the homology and how the
homologous sequence can be used to conclude evolutionary relationships between species.
Also, to learnt about the principles of sequence alignment as used by the computer algorithms
and the multiple sequence alignment (MSA) and find how these are used to construct
phylogenetic or evolutionary relationship trees.
2.0 METHODOLOGY

Bioinformatics I and II
 As per lab guidelines.
3.0 RESULTS
i. Bioinformatics I

Part A:
Unknown 1:
a) Blast n
b) Agathis silbae maturase K (matK) gene, partial cds; chloroplast
c) 2250 bits (1218)
d) 1218/1218(100%)
e) 0/1218(0%)
f) 0.0
Unknown 2:
a) Blast n
b) Homo sapiens isolate Lecce635 control region, partial sequence; mitochondrial
c) 1151 bits (623)
d) 623/623(100%)
e) 0/623(0%)
f) 0.0
Unknown 3:
a) Blast n
b) Vibrio parahaemolyticus isolate Fiji 21 uridilate kinase (pyrH) gene, partial cds
c) 1074 bits (581)
d) 581/581(100%)
e) 0/581(0%)
f) 0.0
Unknown 4:
a) Blast n
b) Vibrio parahaemolyticus isolate Fiji 17 uridilate kinase (pyrH) gene, partial cds
c) 1074 bits (581)
d) 581/581(100%)
e) 0/581(0%)
f) 0.0
Unknown 5:
a) Blast n
b) Herpestinae sp. PAML-2006 NADH dehydrogenase subunit 6 (ND6) gene, partial cds;
tRNA-Gln gene, complete sequence; cytochrome b (cytb) gene, complete cds; tRNA-Thr
and tRNA-Pro genes, complete sequence; and control region, partial sequence;
mitochondrial
c) 3046 bits (1649)
d) 1653/1653(100%)
e) 0/1653(0%)
f) 0.0
Unknown 6:
a) Blast p
b) cytochrome b [Herpestinae sp. PAML-2006]
c) 763 bits (1970)
d) 379/379(100%)
e) 0/379 (0%)
f) 0.0
Part B:
a) Blast n
b) Vibrio parahaemolyticus isolate Fiji 21 uridilate kinase (pyrH) gene, partial cds + Vibrio
parahaemolyticus isolate Fiji 17 uridilate kinase (pyrH) gene, partial cd
c) 830 bits (449)
d) 535/578(93%)
e) 0/578(0%)
f) 0.0
Part C:
a) Blast x
b) Herpestinae sp. PAML-2006 NADH dehydrogenase subunit 6 (ND6) gene, partial cds;
tRNA-Gln gene, complete sequence; cytochrome b (cytb) gene, complete cds; tRNA-Thr
and tRNA-Pro genes, complete sequence; and control region, partial sequence;
mitochondrial + cytochrome b [Herpestinae sp. PAML-2006]
c) 524 bits (1350)
d) 351/378(93%)
e) 0/378(0%)
f) 0.0
ii. Bioinformatics II
Part I: Alignments
a)
G A C T G A C T G A C T Scor
e
: : : : : : : :
G C C G G A A T T A C T
+1 -1 +1 -1 +1 +1 -1 +1 -1 +1 +1 +1 +4

b)
T A T A T A T A T A T A Score
: : : : : : : :
T A A A T T A T T A T A
+1 +1 -1 +1 +1 -1 -1 -1 +1 +1 +1 +1 +4

Part II (a): Random Query Sequence I


a)
G T T G C A A C A G C T Score
:
G T C G C A T A C A T G
+1 +1 -1 +1 +1 +1 -1 -1 -1 -1 -1 -1 -2

b)
G T T G C A A C A G C T Score
: : : : :
C C T A A A T G G T C G
-1 -1 +1 -1 -1 +1 -1 -1 -1 -1 +1 -1 -6

c)
G T T G C A A C A G C T Score
: :
C T A T G C T G C A G A
-1 +1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -10

d)
G T T G C A A C A G C T Score
: : :
C A T T G A C G T G C A
-1 -1 +1 -1 -1 +1 -1 -1 -1 +1 +1 -1 -4

e)
G T T G C A A C A G C T Score
: : : :
C G A T T A A C G T G C
-1 -1 +1 -1 -1 +1 +1 +1 -1 -1 -1 -1 -4

f)
G T T G C A A C A G C T Score
: : : :
A C T G T C G G T A A C
-1 -1 +1 +1 -1 -1 -1 -1 -1 -1 -1 -1 -8

g)
G T T G C A A C A G C T Score

G C G A C A C T A T G T
+1 -1 -1 -1 +1 +1 -1 -1 +1 -1 -1 +1 -2

h)
G T T G C A A C A G C T Score
: : : : :
A T A C T G A T G C G C
-1 +1 -1 -1 -1 +1 +1 -1 -1 -1 -1 -1 -6

i)
G T T G C A A C A G C T Score
: :
A A G C T G G A C C T T
-1 -1 -1 -1 -1 +1 -1 -1 -1 +1 -1 -1 -8

j)
G T T G C A A C A G C T Score
: : :
T G T A T C C G G A C A
-1 -1 +1 -1 -1 +1 -1 -1 -1 -1 +1 -1 -6

Part II (b): Random Query Sequence II


a)
A A T A T T A T A A T T Score
: : : : : : : :
T A A A T T T A T T A A
-1 +1 -1 +1 +1 +1 -1 -1 -1 -1 -1 -1 -4

b)
A A T A T T A T A A T T Score
: : : : : : : :
T T T A A T T A A T A A
-1 -1 +1 +1 -1 +1 -1 _-1 +1 -1 -1 -1 -4

c)
A A T A T T A T A A T T Score
: : : : : :
T A T T A A T A T A A T
+1 +1 +1 _-1 -1 -1 -1 -1 -1 +1 -1 +1 -2

d)
A A T A T T A T A A T T Score
: : : : : : : :
A T A A T A T A T A T T
+1 -1 -1 +1 +1 -1 -1 -1 -1 +1 +1 +1 0

e)
A A T A T T A T A A T T Score
: : : : : :
A T T T A A T T A T A A
+1 -1 +1 +1 -1 -1 -1 +1 +1 -1 -1 -1 -2

f)
A A T A T T A T A A T T Score
: : : : : :
T A A T T A A A A T T T
-1 +1 -1 -1 +1 -1 +1 -1 +1 -1 +1 +1 0

g)
A A T A T T A T A A T T Score
: : : : : :
T T T A A A A A T A T T
-1 _-1 +1 +1 _-1 -1 +1 -1 -1 +1 +1 +1 0

h)
A A T A T T A T A A T T Score
: : : : : : : :
T T T A T T A T A A A A
-1 -1 +1 +1 +1 -1 +1 +1 +1 +1 -1 -1 2

i)
A A T A T T A T A A T T Score
: : : : : :
T A A T T A A A T A T T
-1 +1 -1 -1 +1 _-1 +1 _-1 -1 +1 +1 +1 0

j)
A A T A T T A T A A T T Score
: : : : : : : :
T A T A A T A A T T T A
_-1 +1 +1 +1 -1 +1 +1 -1 -1 -1 +1 -1 0

Q1. Even with as few as 10 randomizations, what can you conclude from your comparison?
By comparing both the scores, the RQS(I) scores display a negative “even” score trend in all the
10 randomization sequences. Hence, for RQS(II) scores, the values were all positive and most
scores were either 0 or 4.
Part III
i)
G T T A A T C G T C Score
: : : : : :
G T T T A T - A C C
+4 +4 +4 -3 +4 +4 -6 -3 -3 +4 +9

ii)
G T T A A T C G T C Score
: : : : : :
G T T T A T A - C C
+4 +4 +4 -3 +4 +4 -3 -6 -3 +4 +9

iii)
G T T A A T C G T C Score
: : : : : :
G T T T A T A C - C
+4 +4 +4 -3 +4 +4 -3 -3 -6 +4 +9

Part IV
i)
G T T A A T C G T C Score
: : : : : :
G T T T A T - A C C
+4 +4 +4 -3 +4 +4 -5 +2 +2 +4 +20

ii)
G T T A A T C G T C Score
: : : : : :
G T T T A T A - C C
+4 +4 +4 -3 +4 +4 -3 -5 +2 +4 +15

iii)
G T T A A T C G T C Score
: : : : : :
G T T T A T A C - C
+4 +4 +4 -3 +4 +4 -3 -3 -5 +4 +10

Q2. With the second scoring system is an alignment identified as better than the others?
 Yes, the second scoring system display better alignment than the rest as it it scores higher
than the others.

Part V:
i)
H A P P P Y Score
: : :
H A N D D Y
+6 +2 -1 -1 -1 +10 +15

Q3. When might it be better to use a protein alignment rather than a DNA alignment?
 The protein alignment is better to use rather than a DNA alignment when the species are
phylogenetically far apart and it is more conserved that can frequently be mutated without
any concern for the protein sequence. In relative to the DNA sequence it is ideal to use
when species are individuals are being concerned.

Part VI:
a)
G T T A A T A A T C SCORE
: : : : : :
G C T T A T A - C C
+1 -1 +1 -1 +1 +1 +1 -2 -1 +1 1
G C T T A T A - C C
: : : : : : : : : :
G C T T A T A - C C
+1 +1 +1 +1 +1 +1 +1 +1 +1 +1 10
G T T A A T A A T C
: : : : :
G C T T A T A - C C
+1 -1 +1 -1 +1 +1 +1 -2 -1 +1 1
TOTAL 12

b)
G T T A A T A A T C SCORE
: : : : :
G C T T A T A C C -
+1 -1 +1 -1 +1 +1 +1 -1 -1 -2 -1
G C T T A T A C C -
: : : : : : : : : :
G C T T A T A C C -
+1 +1 +1 +1 +1 +1 +1 +1 +1 +1 10
G T T A A T A A T C
: : : : : :
G C T T A T A C C -
+1 -1 +1 -1 +1 +1 +1 -1 -1 -2 -1
TOTAL 8

Q4. How many pair scores would there at a nucleobase position in an MSA a 4 sequences?
 Sum all N (N-1)/2 = pair scores
4 (4-1)/2 = 6 pair scores.
Q5. What do you think the optimal alignment score refers to for an MSA?
 It is referring to the highest score in an alignment.

4.0 DISCUSSION
Part 1: Bioinformatics I
As the results display, the tool was then used to BLAST the unknown sequence 3&4 and 5&6,
the outcome of this shows the difference and similarities of comparing the unknown sequence
against each other. When using the BLAST tool to compare unknown 3 & 4, both unknown
sequence was BLAST nucleotide whereas the unknown 5 & 6 was different, in which unknown 5
was BLAST nucleotide and unknown 6 was BLAST protein. The results show that lowest bit score
was unknown sequence 5&6, it had 524 bits (1350) whereas the unknown sequence 3&4
contain the highest bit score of 830 bits (449). The similarities these unknown sequences have,
is that their identities in percentage which was 93% match and their gaps in the alignment was
0 thus the E value was 0. The used of BLAST sequence, it is better to used it when the unknown
sequences do not align with their matches, thus as for the DNA sequences, it is better to used
when there is a match found in the sequence. The E value is referred to the “Expected value” in
which is defined as the parameter that defines the number of hits one can "expect" to get by
chance when looking a database of a specific size, thus, it reduces the exponentially as the
Score (S) of the match increases.

5.0 CONCLUSION
In summary, the practical was successfully done as to use and study about the National Centre
for Biotechnology Information (NCBI), set out and understood the results from BLAST searches
to, determine the effectiveness of a set of primers for amplifying a particular gene region of
interest, looked for SNPs in HIV-1 isolates from a patient supposed to confer drug resistance,
classify homologues and best matches found in the NR data base for sequences determined in
Pacific Island research. To make pairwise BLAST comparisons of unknown sequences and draw
conclusions about the genetic diversity of these unknown species. Thus, understand about
homology the way homologous sequences can be used to conclude evolutionary associations
between species, understand the basic principles of sequence alignment and how it is used by
computer algorithms, understand about multiple sequence alignment (MSA) and the way these
are used to build phylogenetic or evolutionary relationship trees.

References
Anon., 2020. BI205 Lab Manual. [Online]
Available at: file:///C:/Users/User/Downloads/BI205%20Practical%209%20(3).pdf
[Accessed 11 November 2020].

Bhagwat, D. W. a. M., 2007. Driven Web-Based BLAST Tutorial. Comparative Genomics, Volume 1 and 2.

You might also like