You are on page 1of 3

BIOL 251 Bioinformatics

Database Exploration Exercise


Name:

URLs:

NCBI www.ncbi.nlm.nih.gov

EBI www.ebi.ac.uk

INSDC www.insdc.org

DDBJ www.ddbj.nig.ac.jp

UniProt www.uniprot.org

1. What do the abbreviations NCBI, EBI, DDBJ, and INSDC stand for?

2. Go to NCBI, and use the left-hand menu to navigate to “DNA & RNA”

a) Look at the list of databases that comes up and find GenBank. What is GenBank?

b) Skim through the other databases on the list. To what database in the NCBI collection would you
submit next generation sequences? (Hint: Look at the information in the short descriptions of
the databases.)

c) Scroll down to “tools.” Notice one of the tools is “BLAST” which is one of the most commonly
used tools and one which many of you may already be familiar. Which tool on the list finds all
open reading frames in a user’s sequence or in a sequence already in the database?

3. Go to the INSDC website.

a) Which organizations are collaborating in the International Nucleotide Sequence Database


collaboration?

b) Can you submit sequences to this website? How are sequences submitted? (In general terms)

4. From the NCBI homepage again, click “Nucleotide database” on the right-hand navigation list.

a) The nucleotide database is a collection of sequences from several sources, including:


b) In the search bar at the top, do a search for Rhodopirellula baltica. Make sure that the
dropdown menu on the left says “nucleotide” indicating you are searching within the nucleotide
database. How many sequence entries can be found in this database for the organism
Rhodopirellula baltica?

5. Still in the nucleotide search, use the search again to search for the accession number FJ797415.
(Accession numbers are unique ID numbers assigned to every sequence in these public databases.) Use
the information from the resulting entry to answer the following questions:

a) From which organism did this DNA sequence originate?


b) How long is the total sequence?
c) Does this sequence contain coding regions? (regions of DNA that code for a protein)
d) Display the sequence in FASTA format by clicking on FASTA near the top of the page, and paste
in the resulting sequence here. This standard format is important for use with many DNA
sequence analysis programs.

6. Back on the NCBI homepage, go to “Taxonomy” on left hand panel, then “Taxonomy” under
“Databases,” then “Browser” under “Taxonomy tools.”

a) Search for Rhodopirellula baltica using this tool now. Click on the result that says Rhodopirellula
baltica SH 1 to display the taxonomic information. Which domain of life and which phylum
within this domain does the organism Rhodopirellula baltica SH 1 belong to? (Hint, hover your
mouse over the lineage information to get more information.)

b) Which translation table is used for this organism?

c) Click on the translation table identified and read the description. From this, can you figure out
what a translation table is? What is being translated into what? Why do you suppose there is
more than one translation table?

7. Go to the EMBL ENA database page here: https://www.ebi.ac.uk/ena/browser/submit?


src=wizard&wiztype=quicklink

a) Search the accession number “FJ797415” again, selecting “nucleotide sequences” from the drop
down menu. Are the results the same as those from NCBI?

8. Go to the DDBJ site:

a) Click on “search.” Using “getentry,” search the same accession number again. Are the results
the same? Why is this important?
9. Go to www.uniprot.org and click on “Proteins – UniProt Knoweledgebase”

a) What is the UniProt Knowledgebase and what are its two sections?

b) What is the major difference between the two sections? (Note: annotation is the process of
adding information to the protein sequence entry, such as protein function and location.)

10. In UniprotKB still, click on ‘Start searching in UniProtKB.’ Select ‘table view’ when the option pops
up. On the left, select to filter by “reviewed,” and click on “mouse” and then click on the first entry that
comes up.

a) What is the name of the protein?


b) What is its function?
c) Scroll down until you see “subcellular location.” Where is this protein usually found within the
cell?
d) Scroll down some more until you see “Structure.” Under 3D structure databases, click on the
entry after “AlphaFoldDB.” On the new page that opens, in the 3D model display, explore the
different view options under ‘Quick styles.’ On the right of the image should be some small
icons. Find the one that allows you to take a screenshot and use the ‘copy’ option to copy/paste
an image of the protein below.

You might also like