You are on page 1of 6

BioInformatics Lab Part I: WormBase

Janet Duerr, Ph.D. (2017)

In recent decades the rapid development of molecular biology methods for analyzing DNA, RNA and protein
sequences has produced a huge amount of data. So much so that an entire branch of science, called bioinformatics, has
developed for managing this data. Thus, many biologists consider their computers essential research tools, as important
as biochemical or behavioral assays or field observations. Gigantic computer databases describe the current state of
knowledge of thousands of organisms from the level of genetics through proteins, and, with less detail, from cells, to
physiology and anatomy and behavior. In this lab, we will explore some of these databases, following a variety of trails
from genotype to phenotype.

For model organisms such as C. elegans and Drosophila, there are centralized databases with many types of
species-specific information, as well as links to data on other species. You will use “WormBase” to explore this
information for your gene of interest. WormBase is an international resource, supported by research grants to people in
the USA, Canada, England, Japan, Germany, etc. that is the central resource for all kinds of information on C. elegans and
other nematodes (including parasitic ones).

You will begin with a DNA sequence that you’ve been provided with. By searching WormBase with this sequence you
will discover information about the gene that it comes from. So let’s begin…

Step 1. Query WormBase with your DNA sequence to find similar, known worm genes and
the peptides they encode.

Here is your DNA sequence to start with:


AGTCTCACTGAAGACATGTGGGTTGATATGGTTAAACTTGGAGCAGGAACCGCTTCCAACCGTGTGAGACGTCAAC

Please note that your results will differ in details from the pictures shown in the instructions. These are from a different
gene and are meant as an illustration to help you navigate WormBase.

Go to http://www.wormbase.org/

A vast amount of information is available


from this page. You can minimize the amount
of information on the page by clicking on the
arrowhead next to each subject heading.
To find out what protein your DNA encodes,
select the Tools (see red arrow).

Selecting Tools will open a new window that


looks like this.

Click on Blast/Blat

Blast is a program that compares a


sequence you input (called a Query) against
sequences that are in the database. It will
give you a list of similar sequences with the
closest match at the top.
When you click on Blast/BLAT you will get a
window that looks like this. Fill in the window as
indicated:

1) Enter Your Query Sequence


Paste DNA sequence (from above) in box

For Query Type select ʘ Nucleotide

2) ʘ BLAST
On the dropdown menu, select
blastx (nucleotide query vs. protein database)

3) leave others at default setting.

Then click on “submit”

You will get a complex window listing peptides


that are similar to the peptide that would result
from transcription and translation of the Query
DNA sequence.

The top box shows graphically how well the


polypeptide that would result from transcription
and translation of your Query DNA sequence
matches to known proteins over the entire length
of your peptide. Next to each line is the code
name of the transcript (in the form X####.#; see
red arrow)

The top line is the best match to your Query.


Enter the code name for the Top line here:

Scroll down the page to find the Box with the


code name you entered above at the top left
(brown arrow). Look for locus = . This gives the
gene name. Enter the gene name here:

Your Query peptide sequence and how well it


matches the peptide found in the database
(green arrow) is below. The peptide sequence is
listed using the one letter amino acid
abbreviations. You can find these abbreviations
on Table 4.1 in your BIOS 1700 textbook.

Copy the peptide sequence and paste below:

Click on the Gene Summary link (Blue arrow)


A new window with a long list of possible
content on the left will appear. Selected items
are highlighted in light bluish-gray. If you hover
your mouse over one of them an “x” appears to
the lift. Clicking will close that window. You
can always click there again to reopen it.

Subjects chosen on the left are open on the


right (with the option of having them expanded
or minimized by clicking the arrowheads).

To begin, select “overview” and expand it


(if it isn’t already expanded).

This page has the gene name (lower case letters and
italicized) at the top and the code name of the
sequence that you used to link to this gene (in the gray
box). This should be the same code that you entered
above in answer to the first question. Confirm that it
is.
Enter the Gene name here (properly formatted)>

Below the gray box you will see:

Legacy manual gene description

click on the arrowhead to open a brief description of


the gene and the protein it encodes. The protein
name is the same as the gene name except that it is in
all capital letters and is NOT italicized.

Enter the Protein name here (properly formatted)>

Step 2. Next, we will use WormBase and the gene name to explore more about the gene
and the protein it encodes in C. elegans
Optional but informative videos:
General navigation: https://www.youtube.com/watch?v=J-TzkD8BQsI
Simple gene searching https://www.youtube.com/watch?v=R3QxmosuhFs
On the menu on the left side of the page Select the
“Expression” window (blue arrow) and expand it.

This gives a range of information on:


- where the gene is expressed (in what cells and tissues)
- when the gene is expressed (at what point in the
worm’s life cycle).

“Anatomic expression pattern” cartoon worm.


The worm cartoon is a faint gray outline of the worm. ANY
color represents expression. So, the flesh color worm with
darker flesh colored down the middle represent
expression in the entire exterior of the worm.

The colored graphs show the expression level (usually the


mRNA levels) for the entire worm sampled over
development.

Other subject headings include “expressed in”,


“expressed during”, “subcellular localization”, and
others (of more complexity).

My protein is expressed in (cell type/ tissue, listed


under “Anatomy term”):

A diagram of worm anatomy is shown below. Note


on the small cross section that the epidermis in
worms is called hypodermis.
Expression is during developmental stage(s) (shown
under “Life stage”):
Next select “Location” in the menu on the left to find
the chromosomal location of your gene

Again, you can close the Expression window to simplify


your view. Locations are given as the chromosome name
(I, II, III, IV, V, or X) followed the individual bases number.

Genomic Position: ______________________

The Genome Browser Preview shows DNA locations in the


bar at the top. Note that these numbers correspond to
the Genomic Position you entered above.

Below this you will see colored diagrams of mRNA


transcripts that are transcribed from this chromosomal
region. More than one transcript may be shown. Examine
the top one.

The diagram shows the intron and exon structure. Introns


are indicated by a thin line and exons are shown as
colored bars. Gray bars indicate “untranslated regions” –
portions of the final mRNA that are not translated since
they are “upstream” of the start codon or “downstream”
of the stop codon.

The end with an arrow is 3’ (downstream). Remember


that either DNA strand can be the template, so the mRNA
may point “left” or “right” for any one gene.
My gene has _____ exons & _______ introns.

The 3’ end of my gene points in which direction (left or


right)?

Step 3: Learn about the phenotypes related to your gene


Go back to the Overview. You may need to reopen it from the menu on the left side of the page. Again, open the
“Legacy manual gene description” if it isn’t already open. Find and enter the following information below.

The gene encodes a ____________

It is related to the human gene ______________________

and it is required for normal _____________________

In parentheses at the top of the Overview is some information about the phenotype of worms that are mutated for this
gene. Find the following information:
mutant animals _______________ when moving and appear _________________.
Next let’s learn about the human homolog of your gene. One reason that C. elegans is such a useful model
organism is that many of its genes have homologs, or related genes, in humans. Studying the gene and the
function of its encoded protein in C. elegans can therefore tell us things about the role of the homolog in
humans and possibly about human diseases involving mutation of the human homolog.

You should see a link in the Overview section, under Legacy manual gene description, that looks like OMIM:######.
Click on that link. This will take you to a database called OMIM (On-line Mendelian Inheritance in Man). This site will
give you information about what is known regarding the human gene that your C. elegans gene is related to.

GENE NAME:
_____________________________

ALTERNATIVE TITLES; SYMBOL:


_________________

HGCN APPROVED GENE SYMBOL:


___________________________

Under the Gene-Phenotype section, you will find information about


a human disease that is caused by mutation of this gene.

DISEASE NAME:
_________________________________________

There is a lot more information on this page. You should scroll down and read some of it. See how much more
information you can find!

You might also like