You are on page 1of 6

BIOT3113: BIOTECHNOLOGY I

Lab 8 – Bioinformatics & Disease

Course title: Biotechnology I


Course Code: BIOT3113
Date: 18/11/2022

This lab is designed to introduce you to basic Bioinformatics software


and tasks one performs when analysing a sequence.

All answers should be added and submitted in this worksheet.

Scenario:
You’ve come across an infected plant in the field. A sample is
collected, and a DNA extraction protocol is carried out. Following a
cloning experiment, you’ve sent the unknown plant pathogen sample for
sequencing. The following sequence is what you received from the lab
(Download your assigned sequence document)
State which Sequences you’ve been assigned: 5

Questions:
1. Go to the National Center for Biotechnology Information website (
https://www.ncbi.nlm.nih.gov/ ). Select the tab for ‘Data and
Software’ then ‘Tools.’ Locate the ‘Basic Local Alignment Search
Tool’ and select ‘Nucleotide BLAST.’ Copy the DNA sequence to the
‘Query Sequence’ box and select BLAST.

a. Which sequence does your sample share the highest percentage


identity with? (1 mark)

Ans. Rhynchosia golden mosaic Yucatan virus clone RhyBodA12


segment DNA A

b. What is the size of your sequence? (1 mark)

Ans. 2589

c. What is the Accession number of your sequence’s closest match


and what is the purpose of the Accession number? (2 marks)
Ans. The Accession number of the sequence’s closest match is
KP641347.1. The Accession number is the sequence record's
unique identification number. An accession number is typically
made up of a letter or letters and numbers.

d. List your sample’s five closest matches and the corresponding


percentage identity (10 marks)
Ans.
Description % Identity
Rhynchosia golden mosaic Yucatan virus 100
clone RhyBodA12 segment DNA A
Rhynchosia golden mosaic Yucatan virus 94.40
clone RhyBodA12 segment DNA A
Rhynchosia golden mosaic Yucatan virus 94.32
clone RhyCGA14 segment DNA A
Cabbage leaf curl virus isolate VE-Rh_V9- 94.25
17 segment DNA-A
Cabbage leaf curl virus isolate VE-Rh_V10- 94.21
17 segment DNA-A, complete sequence

e. What does “percentage identity” mean? (1 mark)


Ans. The percentage identity refers to the the identity
reports the percentage of base pairs that are identical in
your specimen's sequence and the sequence of the reference
specimen.

2. Create a phylogenetic tree by selecting the first ten similar


sequences and choosing the ‘Distance tree of results.’

a. Insert a screen grab of tree. (3 marks)

b. What is the purpose of a phylogenetic tree?


(1 mark)

Ans. The use of phylogenetic trees to express


theorized
evolutionary links between nested
groups of taxa that are supported by
shared features is crucial for
organizing knowledge about
biodiversity.
c. What does the phylogenetic tree tell you about the sequence
you isolated from the infected plant? (5 marks)
Ans.
For phylogenetic trees, the nodes (such as species and genes)
stand for biological organisms, while the branches represent the
ancestor-descendant relationships. The phylogenetic tree shows
the most ancient common ancestor (first grey node); from that
point node shows the Rhynchosia golden mosaic Yucatan virus clone
RhyBodA12 segment DNA A which is most similar to the most ancient
ancestor (branch at the very bottom). The branches from the most
ancient common ancestor at the top show the evolutionary link
from the first node is about as evolutionarily as the most
ancient common ancestor (grey node) and the Rhynchosia golden
mosaic Yucatan virus clone RhyBodA12 segment DNA A. Those
branches are approximately the same length which means they are
about the same evolutionary distance for the most ancient
ancestor. Further branching from the third node in at the top of
the first grey node show extensive branching which indicates that
many evolutionary change (nucleotide sequence changes occurred)
therefore producing different organisms with slight similar
nucleotide sequencing. The length of the branch indicates
evolutionary distance. In other words, if the branch is long the
evolutionary distance between organism would be far and therefore
their nucleotide sequence would differ greatly where as a shorter
branch would indicate a closer evolutionary distance and a
similar nucleotide sequence shared between the organism the
branch links.

This portion aims to introduce you to the initial steps in protein


analysis. You will be required to do basic research on the structure
of Begomoviruses.

3. Return to the National Center for Biotechnology Information


homepage. Select the tab for ‘Data and Software’ then ‘Tools.’
Locate the ‘Open Reading Frame Finder’ and select. Copy the DNA
sequence to the ‘Query Sequence’ box and submit.

a. How does this BLAST function identify the ORFs/proteins)?


BLAST fist searched, recognizes, reads the start which is ATG
then reads the sequence in base triplets (codons) until a stop
codon (TAG, TGA, and TAA) is reached. (1 mark)

b. What is the shortcoming of this methodology?


One shortcoming of the above-mentioned methodology is the
Blast ORF Finder will only recognize the DNA start and stop
codons (ATG) and (TAG, TGA, and TAA) when reading a DNA
sequence and cannot read for codons that include uracil (error
in DNA replication) (2 marks)

c. Return to the NCBI homepage and search for the Accession


number of your sequence. Click on the result to examine the
sequence record. What information here can you use to fill in
the table below? What does the abbreviation mean?

The name of the protein, stop and start codon coordinates, the
name of the gene as well as the protein, the protein ID can be
used to fill in the table below and the amino acid size.
Furthermore, a CDS is a set of nucleotides that matches the
sequence of amino acids in a protein. ATG and a stop codon
make up a typical CDS beginning and ending. An open reading
frame can include CDS as a subset (ORF).
(2 marks)

d. Fill in the table below with the functional proteins (15


marks)
Description Start Stop nt aa
size size
Coat protein 165 920 755 251
replication enhancement 917 1315 401 132
protein
transactivation protein 1062 1451 389 129
replication enhancement 1393 2442 1049 349
protein
AC4 protein 1998 2363 365 121
nt – nucleotide, aa – amino acids

4. Now that you know the location of your CP (coat protein), copy
the sequence and use the Primer-Blast tool (
https://www.ncbi.nlm.nih.gov/tools/primer-blast/ ) to create
primers that will amplify the CP. (3 marks)

Primer Sequence Size


Forward Primer GACCCCCGATGTTCCAAGAG 20
Reverse Primer GACAATGGATTCACGCACCG 20

a. State your reason for choosing the above primer pair (1 mark)
Ans. These primers had the lowest relative self-complementarity
and self-3′-complementarity, optimal the GC content (40% and 60%) and
optimal melting temperature (Tm).

5. What are three characteristics of a good primer? (3 marks)

Ans. Three characteristics of a good primer include:

1. Primer with low self-complementarity and self-3′-


complementarity low scores are desirable as this means that
the possibility of the primer forming a dimer by binding to
itself is low.

2. The optimal melting temperature (Tm) of a primer is 54°C


or higher. The annealing temperature (Ta) of a primer is
often above its Tm (of 2-5°C).
3. The GC content of a primer should be between 40% and 60%.

The NCBI GenBank is an important tool used by scientists all around


the world. This section will introduce you to the most basic function
of the GenBank

6. Go to the NCBI GenBank home page (


https://www.ncbi.nlm.nih.gov/genbank/ )

a. What is the GenBank? (1 mark)

Ans. _______________________________________________

b. Use the List of Accession numbers given below to


locate/identify the name of the sequences. (20 marks)

Accession Number Sequence Name Size/bp


AP018036
FJ407052
EF585288
NC_046154
AF035224
AE014075
FJ601917
NC_004162
NC_001802
AY391777

In this section you’ll be asked to develop a hypothetical DNA vaccine.

Scenario:
In recent times, since the discovery of Anelloviruses, they have been
increasingly linked to various other diseases as some kind of ‘helper-
molecule’, exacerbating the development of chronic illnesses. A
vaccine could provide protection against these viruses, and in turn
slow down the progress of other illnesses.

7. What are the different types of vaccines? (6 marks)

Ans. __________________________________________

8. How is immunity achieved through vaccination? (5 marks)

Ans. ___________________________________________

9. What are DNA vaccines? (2 marks)

Ans. ___________________________________________

10. Now that you know how to BLAST a nucleotide sequence and locate
ORFs, use this knowledge to design a cloning experiment to create
a DNA vaccine for the sequence located in the document named
“Question 10 DNA Vaccine”
(Resources you may need: http://nc2.neb.com/NEBcutter2/ and the
provided PDF document ‘Human anelloviruses and the central nervous
system’)

(20 marks)
Ans. ___________________________________________

TOTAL 105 MARKS

You might also like